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PREFACE 


This book is designed as a beginning one year textbook in modern 
statistics with elementary calculus a prerequisite. It should be useful for 
anyone wanting to learn statistics starting with first principles. It is an 
expanded version of lecture notes used in a two-quarter course (three lectures 
per week) in statistics taught to juniors, seniors and first year graduate 
students in all areas of science and engineering at Virginia Polytechnic 
Institute and the University of Virginia. The material was also used several 
times with industrial research groups. 

In writing this book I have tried to keep a balance between mathematical 
(theoretical) statistics and applied statistics. Many of the concepts are 
introduced by examples from applied statistics after which the concepts are 
formulated in mathematical terms and given a theoretical treatment. By 
presenting the material in this way, it is hoped that the reader will gain some 
real insight into the nature of statistics and at the same time learn how to 
apply the statistical procedures to actual experimental situations. 

There are good statistical books for research workers in science and 
engineering, but generally speaking they are not very useful as a first intro- 
duction to statistics. In the first chapters of this book th'e reader is given 
an introduction and grounding in the foundations of statistics along with 
many examples and exercises. After this, some of those topics which engi- 
neers and scientists find most useful are introduced and developed. It is not 
intended that the topics selected for discussion be given an exhaustive 
treatment, but rather that they be developed to the point where they are 
useful to the practitioner of statistics. For the reader interested In special 
techniques or more sophisticated methods, some exercises and references 
have been given at the end of each chapter. (For example, very little is said 
about sampling techniques, quality control and acceptance sampling. There 
are already good textbooks in these areas.) 

The examples are selected to illustrate the principles presented. That is, 
they are selected so that the principles of statistics can be understood with- 
out special knowledge of a particular subject matter field. These so-called 
common-sense examples refer to such things as heights, weights, tensile 
strengths, teaching methods and scores, and measures of objects which 
should be familiar to readers with at least two years of college. 

vit 
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Each chapter contains a copious supply of exercises which are applicable 
to many and widely different fields of science and engineering Any reader 
should find numerous exercises of special interest The exercises aie designed 
to give the reader practice m applying the material presented, to extend both 
his practical and theoretical concepts of statistics, and to encourage him to 
look carefully at new concepts which are outlined only in exercises 

The book may be used as a text for either a theoretical or applied course 
in statistics As a theoretical text emphasis should be given to Chaps 3, 4 
and 5, to those sections of the remaining chapters which deal primarily with 
the mathematical development, and to those exercises which stress proofs 
and new concepts As an applied text the proofs and mathematical devel- 
opment may be cut considerably and the sections and exercises on applica- 
tion stressed 

This IS a book on statistics It should be useful to anyone learning the 
problems of numerical analysis in experimentation or planned investigations 
The book should also be useful for anyone seeking some of the foundations 
of statistics and the way in which part of the statistical structure may be 
developed 

Many groups and individuals have helped in dtvehping this book 
First I wish to gratefully acknowledge my (hanks to Dr John E Freund 
for reading the manuscript and making numerous helpful suggestions, many 
of which have been included I wi$h to acknowledge my gratitude to the 
several classes of students who read and made helpful suggestions on parts 
of the manuscript, to the United States Weather Bureau for data used in 
several exercises, to friends in (he Celancse Corporation of America for 
data (coded) and advice on special problems, to friends in the West Virginia 
Pulp and Paper Company, United States Steel, and White Sands Missile 
Range for the opportunity to see applications to special problems, to the 
Statistics Department of Virginia Polyccchnic Institute for typing several 
chapters of the firs' craft of (he manuscrip(. to my colleagues in the Mathe- 
matics and Psychology Departments at Hollins College for valuable criticism 
and suggestions, to Miss Margaret Shinnick for reading most of the manu- 
script and commentmg on style and content, to David and Sutllen Wme, 
my son and daughter, for obUimng the data m Tables 5 2 and 5 5, and to 
many other persons who have published data which are reproduced and 
acknowledged at the appropriate places in (he text Further, I am indebted 
to the Danforth Foundation for a (ravel grant and to the personnel of the 
Hollins College library for their kind assistance on numerous occasions 

Finally, I wish to express my apprcaauon to Professor E S Pearson for 
hs kind permission to reproduce Tables HI. IV and Vll from Biometrka 
Table IX from Biomeinka Tables for Slatisueians, and Table X from Tables 

oJ>he Ordmates and Probability Integral of the Dismbuuon of the Correlation 
Coefficient m Smalt Samples and to McGraw-Hill, Inc for kind permission 
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to reproduce Table VIII. I am indebted to Professor Sir Ronald A. Fisher, 
F.R.S., Cambridge and to Dr. Frank Yates, f.r.s., Rothamsted, also to 
Messrs. Oliver and Boyd Ltd., Edinburgh, for permission to reprint Table 
VI from their book Statistical Tables for Biological, Agricultural and Medical 
Research. Also, I wish to express my appreciation for permission to reproduce 
Tables 8.1, 8.2 and 10.9 from the Journal of the American Statistical Asso- 
ciation; Tables 8.3 and 8.6 from books published by McGraw-Hill, Inc.; 
Tables 10.10 and 16.3 from Biometrics; Table 16.1 from a publication by the 
American Cyanamid Company; and Table 16.5 from the Annals of Mathe- 
matical Statistics. 


R. LOWELL WINE 

Roanoke, Virginia 
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INTRODUCTION 


I.I. A DEFINITION OF STATISTICS 

Modern statistics is a new and vigorous discipline. It is so new that 
some of the men who were most instrumental in establishing statistics as 
it is known today are still actively engaged in research and'teaching. Its 
vigor can be attested from the fact that statistics is growing so rapidly that 
it is impossible to incorporate many of the latest techniques in a textbook, 
for by the time the last section is written the first chapters already need 
revision 

Statistics is playing an increasingly important role in research activities. 
For this reason it is necessary that special training in statistics be given as 
early as possible so that experimentation and scientific investigations do 
not suffer. The study of statistics should not be viewed as just another area 
of study which is merely desirable for the scientist and engineer; instead, 
statistics should be viewed as a very sensitive instrument which is capable 
of successfully coping with many of the most difficult problems posed by 
modern investigations. Ignoring the use of statistics in many of our research 
activities today should no more be tolerated than that of ignoring tractors 
and combines in the wheat fields of Kansas or of ignoring the latest drugs 
in the treatment of ailments. 

The term “statistics” is old, but its present-day interpretation is very 
young. The term no longer simply refers to the collection and compilation 
of data; instead, st atistics the, science of decision-making in 

t he face of uncertainty . It has to do with both the deductive and the inductive 
process, that is, both mathematical and scientific procedures. Statistics 
currently deals with the theoretical development and application of methods 
suitable to numerical measurements. 


1 



nnnooucnoN 


CHAP, i 


Whenever data are collected, statisticat methods may be vsed In fact, 
anyone who attempts to work with data acts like or has occasion to act 
like a staUstician Statistics is a science, 6a«d_upotLnwthema^, which 
deaiswith such prq^ e^'^ ( tj'^ranmnp a program or an cxperiinent for 
that reliable eondusions can be drawn from the data, 
(tabulating and analyzing the data,^ deciding what mterpretations and 
conclusions can properly be drawn from the data, (4V«lcterininmg to what 
extent the conclusions arc reliable, and (S)' justifying by mathematics the 
methods used in (1), (2), (3), and (4) : ^uhcal me l hods arelhose procedur es 
used in desi pnin p and tilanning, experiments and in collecting, analy zing, 
a nd interpreting data StatajioiLjlietfr)’ has to do with the mathematicaj 
development and justification of the methods used 

Statistical methods may be thought of as falling m two classes Those 
methods which arc used more meaningfully to describe a set of data but 
which do not involve genctaiiiaiions arc commonly called deicripliie sraasti- 
cal methods Those methods which arc used on a relatively small set of data 
to generalize concerning the nature of a much larger set of possible data 
make up methods of sialtsucal inference 

Descriptive statistical methods or. simply, dtsttiptue siatistici, include 
those methods which are used in makirg and describing such well known 
objects of our everyday experience as graphs, charts, and tables Such 
examples as the batting average of leading hitters, defense'Spending graphs, 
airplane travel charts, stock market averages, census figures, production 
of automobiles bv months, and the index of hvmg costs represent only a 
few of the illustrations of descriptive statistics we see regularly Thus, many 
of the results and techniques of descriptive statistics are known to most 
of us 

The methods of statistical inference are not so well known, even though 
illustrations of their use are fairly common V/e read, for example, that 
the Gallup poll makes a survey and predicts that Joe Brown will be elected 
governor instead of Sam Jones, or that Kinsey makes some inference about 
the sex habits of the Amencan female, or that it has been prov'ed that one 
brand of cigarettes contains Jess tar than other brands, or that a manufacturer 
claims that the average life span of a certain type of light bulb is 2500 hours 
In each case, we read, and perhaps take issue with, the conclusion, but 
we know Imie or nothing about the methods used in arriving at these inferred 
statements 

The student is cautioned against flunku^ jhaj meihods of descriptiyt 
statistics and statistical inference are always distinct and clearly defined 
As a matter of fact, most methods wbjdi are used in descriptive statistics 
are also applied m statistical inference The two terms arc generally used 
with reference to the kinds of problems we wish to consider, not with refer- 
ence to particular formulas or senes of formulas For example, the “average 
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number of defective parts” of a manufactured article might be used in 
either sense. The term “average” may describe the nurnber of defective parts 
manufactured on a given day, or it may be used to infer the number of 
defective parts which will be produced per day during the remainder of 

the year. '' 

Since statistics in some way touches so many of our daily activities, 
it is not possible to give a descriptive and short definition of the term. How- 
ever, for our purposes it is probably adequate to think of statistics as both 
a pure and an applied science which is involved in creating, developing, and 
applying procedures in such a way that the uncertainty of inferences may be 
evaluated in terms of probability. It should be noted that deductive tech- 
niques, as used in mathematics, are required in developing the procedures. 

1.2. SCIENTIFIC METHOD AND APPLIED STATISTICS 

The student will soon realize that the selection and application of some 
statistical methods, particularly those used in the analysis and design of 
experiments, are similar to what the scientist and engineer do when setting 
up a hypothesis, planning and conducting an experiment, and testing the 
hypothesis by using the experimental data. In both the scientific and statisti- 
cal disciplines one is concerned with such things as planning and analyzing 
an experiment in such a way as to establish a fact (the hypothesis) within 
the framework of a specific theory. In addition to this, the statistical dis- 
cipline generally requires that a measure of the degree of uncertainty, in 
terms of probability, accompany the inference drawn from the experiment. 

Even if the experimenter understands the theoretical framework and 
knows specifically what hypothesis he wishes to test, it is still not always 
obvious which collection of statistical techniques (methods) should be 
applied. For just as there are many ways to get from Denver to Boston, 
say, there are usually many ways to collect and use data statistically to 
‘ justify a statement within a fixed theoretical framework. The investigator 
normally first looks for a relatively short and relatively simple procedure, 
I but these are not the primary considerations. He wants to draw the correct 
^ conclusion within the framework of the experiment in the most efficient 
i way, taking into account, among other things, time, money, and relative 

^ importance of the investigation. Thus, in addition to knowing several 

alternative statistical routes, the investigator must make a decision about 
the best one to select. In other words, he must decide which “statistical 
model,” including a group of accompanying techniques, to use in the ex- 
periment, it being understood that a model is an idealization of a particular 
experimental situation. 

;(t There are many statistical models which may, or may not, be useful 

( 0 ! solving real problems. For example, the normal curve may be used as 
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a model in describing tbc distribution of the grades of a« beginning maihe- 
matics students at a large engmeenng school A model is not alwa>s so 
simple, nor is it alv<>a>s used in sucha straightforuard wav, but the objecine 
m using a model is alwajs the same— to make the concept, analysis, and 
conclusion simpler to undcistand and to disseminate Once the model 
IS selected, the resulting conclusion can be relied on only to the extent that 
the model approximates the situation being studied 

Statistical models hate not been constructed for many possible expen 
mental situations Thus, it may be destrable in a specific inxestigaiion to 
construct and develop the properties of a new model, and this is much 
easier when one already knows something about statistical models and 
the associated procedures 

J 3 A BRJfP H/SrORy OP SfAriCTlCS 

Even though data hate been compiled almost from the beginning of 
recorded lime the science of statistics has never been so broad in its scope 
as It IS today At first statistKS seems to have consisted m census taking 
About 30S0 BC the ancient Egyptians collected data concerning wealth 
and population before building the pyramids There are two censuses of 
the Israelites recorded in the book of Numbers In 594 b c a census was 
taken in Greece for the purpose of levying taxes There are many other 
records of census taking m most coumnes of the world from early times 
to the present 

in addition to taking ihe census the Romans prepared surveys of the 
entire country and kept records of binhs and deaths After the Middle 
Ages certain individuals as well as governments started keeping records 
on such things as wealth armies commerce laws, and national resources 

About the middle ot the eighteenth century, Gottfried Achenwall. a 
professor of philosophy in a German university first used the word smtistik, 
and the name rronificj was introduced into England by E A W Zimmerman 
about 1787 The Royal Statistical Society of London was founded in 1834 
and the American Statistical Association in 1839, each of these societies 
holds meetings periodically and publishes papers of current interest Thus, 
anyone who is ntercsied may follow the growth of statistics over the last 
125 years by looking at the records of these societies In the first number of 
the Journal of the Koyal Slalisiical Societ) issued m 1838-39, we read, 
“Statistics may be said m the words of the prospectus of this society, to be 
'Aft nsctTiunring an* brmgmg togrtber di iftiose facts which are calculated 
to illustrate the conditions and prospects of society " 

The theory of probabtht) and the normal disinbuUon have been very 
important in the development of staustics, and they are now of primary 
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importance in the theory and application of statistics. In the middle of the 
seventeenth century, some gambling experiences with a particular game of 
dice led Chevalier de Mere, a French nobleman, to consult the famous Blaise 
Pascal (1623-62) concerning the most advantageous way to bet. Pascal 
solved the problem, and de Mere posed another problem which Pascal 
investigated. This led to a private correspondence with the French lawyer- 
mathematician Pierre de Fermat (1601-65) and to the first foundations of 
the theory of probability. After becoming acquainted with the contents of 
this correspondence, Christiaan Huygens (1629-95) developed some new 
ideas and in 1654 published a first book on probability. Jacob Bernoulli 
(1654-1705), Abraham de Moivre (1667-1754), and Pierre Simon Laplace 
(1749-1827) made great contributions to the early theory and application 
of probability. Abraham de Moivre is responsible for the equation of the 
normal curve (1733). Much later, Laplace and Karl Friedrich Gauss (1777- 
1855) developed the same results independently of each other. 

Laplace in his studies of the origin of comets seems to have been the 
first to attack problems relating to rules of “inductive behavior,” that is, 
to the adjustment of our behavior to a limited number of observations. 
The geologist Charles Lyell (1797-1875), the biologist Charles Darwin 
(1809-82), and the monk Johann Gregor Mendel (1822-84) in his experi- 
mental breeding of plants based some of their work on statistical arguments. 
None of these men was a statistician, and they did not spend their energies 
in placing statistics on a firm foundation. 

Karl Pearson (1857-1936). initially a mathematical physicist, after 
becoming interested in evolution spent nearly half a century in serious 
statistical research. He helped to found the journal Biometrika and gave 
the study of statistics its first great impetus. Sir Ronald Fisher (1890-1962) 
made many important contributions to statistics. Since the 1920’s Fisher 
and his students have also stimulated great interest in applications of sta- 
tistics in many fields, particularly agriculture, biology, and genetics. Some of 
the basic theory on hypothesis-testing was presented by J. Neyman (1894-) 
and E. S. Pearson (1895-) as late as 1936 and 1938. 

Thus, it was not until early in this century that statistics started to be 
used to any large extent outside of census taking and other specialized areas 
such as genetics and astronomy. Since the late 1920’s interest in the appli- 
cation of statistical methods to all types of problems has grown rapidly. 
It is interesting to note that prior to the 1920's applied statistics was pre- 
dominately descriptive in nature, and that since the 1920's statistical inference 
has grown to constitute nearly all of statistics. Indeed, today purely de- 
scriptive methods play a very minor and almost incidental role in statistical 
applications. 

It would be pointless to try to enumerate all the areas in which statistical 
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methods are used To mention only a few, statistics is becoming increasingly 
important in agriculture, astronomy, business biology, economics, chemistry, 
engineering, industrial studies, insurance, medical research, meteorology 
physics, psychology, sociology, and transportation The use of statistical 
methods in each of these fields grew in a different way In biology appli- 
cations started early, ifj transportation late In agriculture there is large 
variation, which is difficult to control, in the physical sciences, in many 
nase-v the variation is small and relatively easy to control 

Statistics is not at tlie same stage of development in all of the different 
fields of study Different aspects are considered important in different areas, 
and special techniques are required at the research level However, the 
fundamental principles on which any of the special techniques are built 
are the same, and many are discussed in this book 

Fuller presentations ©f ibese and other topics arc found in the references 
of this chapter The student is urged to read further In fact, the serious 
student should form a habit of reading some references with each chapter 
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FREQUENCY DISTRIBUTIONS 


The methods of statistics arc concerned with the study of variation The 
nature of the variation cs shown by a frequency disinbutior Several types 
of distributions for both continuous and discrete variables arc considered 
Characteristics of distributions such as the central value and amount of 
scatter are discussed 

2 ? D/SCl/SS/ON OF T6«MS 

We shall assume that you know the meaning of such terms as data, 
object, individual variable, collection and set Some terms m common usage, 
such as observation and population have different or special meanings as 
they are applied in statistics and, thus, will need to be defined and discussed 

We shall think of an observation as a recording (usually numerical) of 
information Scores, measurements, ranks and categories are types of 
observations A score is a numerical assessment of an individual or an object 
on a scale Examples are the grade on a chemistry test the points scored 
by a basketball team during a game and an intelligence quotient Scores 
arealsoused to indicate quality of such things as meal and butter Wc usually 
think of a measurement as a numerical value indicating the extent or size 
of such things as the height of a tree, the length of a rod, the volume of 
a liquid, the weight of a chemical compound, the temperature of a room, 
the pressure of a gas, the intensity of sound, and the amount of electric 
current The term rank is used to express the position of an object in regard 
to some stated quality relative to all other objects under consideration at 
the moment For example, in a goodness of-lastc experiment involving 
three brands of grape juice, say A. B, C. one judge might give brand A 
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rank 3, brand B rank 1 and brand C rank 2, where rank 1 indicates best 
taste, rank 2 second best, and rank 3 third best. Ranks are also used in 
beauty contests, academic achievement of the West Point graduating class, 
certain market reports, etc. Categories are used to indicate color of eye, 
type of tree, kind of metal used in making a machine part, degree of interest 
in a television program, etc. Illustrations of observations which we do not 
consider to fall in the above four types, to mention only a few, are the 
number of defective parts in a manufacturing process, the rate of germina- 
tion of a seed, the proportion of cars exceeding the speed limit, and the 
number of students taking a course. 

An observation is made on some characteristic of an object. For example, 
the object might be a human skull and the observation might be head width, 
the object a day and the observation maximum temperature, or the object 
a human female and the observation color of hair. In each case, the obser- 
vation would differ from object to object. Thus, for example, we call head 
width a variable, since it differs from skull to skull. We shall always think ‘ 
of a value of a variable as being a number. Thus, observation and value of 
a variable are synonymous terms so long as the recording of information is 
numerical. 

Variabl es mav be continuous or discrete. A continuou s variable is a 
vari able which c atL-assume_ anv real value between twodistinct numb ers. 
and a disc reie-mriable- ia one which can assu me only isolated values. Ex- 
amples of variables which are discrete are th^number of heads in lOU tosses 
of a coin, and the number of mining accidents per year. Length of life of 
a light bulb, velocity of a jet airplane, height of an adult male, brightness 
of light, and amount of heat resulting from friction are examples of continu- 
ous variables. The reader is cautioned against thinking of height, say, as 
a discrete variable. It is true that any ordinary set of data from experience 
will assume only isolated values, and, thus, you might tend to think of 
these data as being discrete. But heights can be measured only to the accuracy 
of the instrument, say to the nearest ten-thousandth of an inch. This intro- 
duces a rounding-off error, so the recorded measurement only approximates 
a number which could fall anywhere between two distinct real numbers. 

The statistician is not concerned with a single observation except as it 
relates to a collection. He is concerned with a collection having some 
common observable attribute, and he calls the collection of such observations 
a population. Sometimes the set o f objects on which the observations are 
m ade is said to be a populatjom butthe^ statistician works with the~Do pu~ 
l ahon ot o bservaiions tor potation of values of the variable). For example, 
one might thinknoTa- collection ofsimilar ro'cRraS"bemg' the population, 
but the statistician usually thinks of the population as being the collection 
of similar observations, say specific gravity, made on the rocks. Also, all 
cans of sliced pineapple prepared in the Hawaiian Islands during a given 
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season might be thought of as a population, but the statistician usually 
thinks of the population as being the collection of similar observations, 
say. weight of sliced pineapple with the juice drained off, made on all these 
cans Thus, when a statistician refers to a population, he usually thinj^ in 
t erms of a collection of numbers and np t*jt"c^ ection of objec ts Ihjt is. 
he thip^« of the set of values of a v ariable ’ 

It should ^ observed that, even inough the statistician deals with 
numerical values he does not forget the objects on which observations are 
made Thus he would not say. “Half the measurements arc over 69 inches." 
when dealing with heights of adult human mates in Roanoke City He 
would say, instead, that half of the males in Roanoke City 21 years of 
age and over are more than 69 inches tall 

We have already observed that there is a population of observations 
corresponding to a population of objects Several examples have been given 
of a population of values of a single variable Such a population is called 
a univarwie population We may wish also to consider populations of values 
of two, three or more variables A population in two variables is called 
bivamie, one in three variables irnanoie, and one in more than one variable 
multi\anaie For example, corresponding to a population of adult males we 
may have a univaridie population of heights, a bivariate population of 
heights and weights or a trivariate population of heights, weights, and ages 
A population may consist ofa finite or an inUnile number of observations 


/If It IS finite and contains N observations, we say that the size of the popuh' 

I non IS N otherwise, it is called an wfimie population If ff is sufficiently 
small, we can comprehend the nature of the population just by examining 
I the set of observations However, if the size of the population is finite and 
\large or infinite, it should be reduced in some way so that we can more 
Uasily grasp its nature This can be done 

\ 1 Tabularly or graphically by classifying the observations into classes 

\ of values of the variable 

\ 2 Arithmetically by finding the numerical value of a few “features", 
\ called parameters determined from the observations which practically 
\ characterize the population, that IS whKhbringout the most important 
' aspects of the population 


It IS with parameters that the statistician is mainly concerned Since popula- 
tions differ, they cannot all be described m the same way However, there 
are certain features which most have in common, and it is these features 
(parameters) which wc first examine in some detail 

In most cases the investigator has observations on not all of the objects 
of a population, he has only a sample which is some subcollection of the 
population of values Whenever this happens, the best he can do is to estimate 
na ure 0 t e population This is done by determining the nature of the 



SECT. 2.2. 


FREQUENCY DtSTRIBUTIONS 


n 


sample and using the properties of the sample to estimate the properties 
which characterize the population.cTh£-pr^)cess by-wtug h^we reach concl u- 
sions ahoiitajjnpulati&n4^Tm^a-saffl^&-taken4b3iiuthe-pQpulatiQiLl s^kno ^wn 
''-ti^^^sTaTtsiicaihif^rence. Most of this book deals with problems of statistical 
inference, ‘buT iiTthrs- chapter we wish to discuss I and 2 of the preceding 
paragraph along with some similar methods used to describe a sample. It 
will soon be observed that the graphical methods are the same for studying , 
both population and sample, and that the arithmetic methods for popula- 
tion and sample differ in some few but important respects. 

Not all types of observations are subject to the same statistical methods. 

In the first chapters only those methods which have broadest applications 
will be described; the more specialized methods will be left for later chapters. 

To illustrate, in most statistical studies the order in which observations are 
obtained is not important, but in certain weather and manufacturing studies, 
for example, order is very important. Thus, special methods are needed for 
dealing with observations falling in this category. 

2.2. FREQUENCY DISTRIBUTIONS IN ONE VARIABLE 

We shall now consider some methods for “picturing” the distribution of 
observations in a population or a sample. Typical examples are used to 
illustrate the methods which are generally applicable. (For more extensive 
treatment of the methods found in this section, see the references at the end 
of this chapter.) 

2.2.7. Continuous Variable 

If the population or sample is small, it can probably best be described 
by a simple listing of the values of the variable. For example, the weights of 
the three whooping cranes in captivity listed in increasing order might ade- 
quately describe the population. The weights would hardly need be reduced 
or summarized in any way. 

Note. If the sole objective of the_i nvestigation is to describe a set of 
obsej ivations. then-it- makes no differenc e whether the~sensTjibu^Fof as 
...^ng a population or as being a sample. However, »'.men the main objective 
, is toUse the sa mpl e to m 'ake statisticaT inferences wTtH regard to the tioDula^ 
tion, the arithmetic methods for sam p]es_diffe iiin some respects from th ose' 
.feLESj Pulations, an d, in th is case, the description o f the sample is considered" 
_arnncidental.parL.of. the.statistipIjpfo^durel ~ 

Some finite populations or samples are of such a size that the investigator 
might be in doubt as to what procedure to follow in order best to describe 
it. If the size of the population or sample were smaller, he would give a simple 
listing of the values; if the size were larger, he would definitely classify the 
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observations m some way The data ,n Table 2 1 might considered to 
fall m this category However, we shall use these run data that ts data listed 
m the order obtained, alphabetical order, or some other arbitrary way. to 
Illustrate some of the tabular and graphic methods of classifying observa- 
tions Jt IS not suggested that all the methods presented in connection with 
these data be used at one time or in fact, that any need be used 


Table 11 

Percentage of Humans in Each Slate 6S Years 
Old and Over in 19J0* 


State 

Percent 

Stale 

Per cent 


65 

Nebraska 

98 


59 

Nevada 



78 

New Hampshire 

108 

California 

85 

New Jersey 

81 

Colorado 

87 

New Mexico 



88 

New York 



85 

North Carolina 


Florida 

86 

North Dakota 

78 

Georgia 

64 

Ohio 


Idaho 

74 

Oklahoma 


Illinois 

87 

Oregon 

87 

Indiana 

92 

Pennsylvania 

84 

Iowa 

104 

Rhode Isbnd 

89 

Kansas 

102 

South Carolina 

S4 

Kentucky 

80 

South Dakota 

85 

Louisiana 

66 

Tennessee 

71 

Maine 

t02 

Texas 

67 

Maryland 

70 

L'tah 

62 

Massachusetts 

100 

Vermont 

10 5 

Michigan 

72 

Virginia 

65 

Minn«ou 

90 

Uashmgton 

89 

Mississippi 

70 

West Virg nia 

69 

Missouri 

103 

Woconsin 

90 

Montana 

86 

Wyoming 

63 


• Bureau of the Census County and Ciif Data Book /9W I A Statistical Absiraut Supple 
ment) page 2 


Generally the first thing to do in order better to grasp the nature of the 
data IS to arrange the numbers in increasing order of magnitude or to list 
each number and the corresponding frequency of Us occurrence Of course 
m doing either our attention i$ focused on the numbers (hemselses and not 
on the objects associated with those numbers Table 2 2 shows a rearrange- 
ment of the data of Table 2 I, listed m order of increasing percentage A 
simple graphical representation of the data is given m the form of a dot 
/requmy dwgrqm m Fig 2 1 In gcnml. Table 2 2 and Fig 2 1 are not ustd 
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Table 2.2 

Percentages from Table 2.1 Arranged in Increasing Order 


4.9 

6.5 

7.2 

8.5 

8.7 

9.8 

5.4 

6.6 

7.4 

8.5 

8.8 

10.0 

5.5 

6.7 

7.8 

8.5 

8.9 

10.2 

5.9 

6.9 

7.8 

8.6 

8.9 

10.2 

6.2 

6.9 

8.0 

8.6 

8.9 

10.3 

6.3 

7.0 

8.1 

8.7 

9.0 

10.4 

6.4 

7.0 

8.3 

8.7 

9.0 

10.5 

6.5 

7.1 

8.4 

8.7 

9.2 

10.8 


SO much to show the nature of the population, but are most frequently used 
as intermediate steps to a more meaningful “picture” of the population 
illustrated below. 

In passing, we observe some facts which are immediately evident from 
Table 2.2 or Fig. 2.1, but which require considerable time to obtain when 
determined from Table 2. 1 . The saving in time becomes greater as the number 
of observations increases. We observe that the highest percentage is 10.8 
and the lowest is 4.9, with a range of 10.8 — 4.9 .= 5.9. We see what percent- 
ages are repeated most often and which percentages do not occur. Two-thirds 
of the states have percentages in the interval from 6.5 to 9.0, which is less 
than half the range. A large concentration of values is found in the interval 
from 8.5 to 9.0. In fact, 17 of the 48 states have percentages in this interval. 



10.0 


7.0 8.0 9.0 

Per cent over 64 yeors of age 

Fig. 2.1 Dot Frequency Diagram for Data of Table 2.1 


The pattern of dots in Fig. 2.1 is So erratic that it is difficult to “picture” 
the set as a whole. Often a better picture of the observations is made possible 
by grouping them into classes (intervals or cells). This has a smoothing effect , 
on the graph if the grouping is pioperly carried out. It is for this reason that/ 
we seek the answer to the following two questions: how many classes should 
be used, and what should be the length (see the next paragraph for definition) 
of each? There is no single answer to either question which holds in all cases. 
However, some general principles are given to guide the investigator in arn\^ 
ing at answers to these questions. The length of each class should be the same, 
unless there is very good reason to the contrary. Whenever there is a large 
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number of observations, say more than 200. experience indicates that m most 
cases 10 to 20 classes are adequate In our problem, assuming for the moment 
that grouping is desirable, we would expect to have fewer classes In deter- 
mining the number of classes tn any problem, two things should be kept m 
mind One is that the most natural or convenient classes be selected, and 
the other is that we select so that the “picture*' of the population is as smooth 


as possible 

That numencaV value which divides two successive classes is called a class 
boundary The class length is the numerical difference m the two boundaries 
of that class If the classes have equal length, then boundaries of the extreme 
classes are obtained by decreasing the smallest boundary value by the class 
length and by increasing the largest boundary value by the class length; 
otherwise, these extreme boundaries arc usually assigned when the particular 
problem is specified 

In our illustration, one unit is the most convenient class length Further, 
noting that the largest value is 10 S and the smallest is 4 9 and that the 
range is 5 9 we decide to use seven classes We still must determine whether 
to use intervals like 7 0 to 8 0 or like 7 5 to 8 5 For reasons which become 
clear at the end of the next paragraph, we select intervals of the latter type 

It is important that (he class boundaries be selected so that there is no 
doubt concerning the class into which each observation falls Thus, it is 
necessary that we select the boundaries so that they differ from any observa- 
tion or give a rule which automatically places m a unique class any observa- 
tion which IS (he same as a boundary For example, with two successive 
intervals like 7 5 8 5 and 8 5-9 5 wc must give a rule for placing the observa- 
tion 8 5 If we wish to have an observation fall in a unique class automati- 
cally, wc may usually select class boundaries halfway between two possible 
adjacent observations For example, 7 55 is halfway between 7 5 and 76 
There ate computational advantages to selecting boundaries like 1 5, but 
the more appropriate method is to select boundaries like 7 55, since our 
observed values are rounded off according to the rule that 7 2, say, represents 
any number between 7 15 and 7 25 Thus, for illustrative purposes wc select 
the class boundaries as 4 55, 5 55, 6 55. 7 55. 8 55. 9 55, lO 55, and 1 1 55 

The mid value of a class is called the class mark and is used' as the rep- 
resentative value for that class The frequency with which observations fall 
m a class is called class frequency 


Once we have decided on the number of classes and the location of the 
class boundaries, a. frequency table such as Table 2 3 may be prepared (So 
long as a table contains a frequency column and a column for class bounda- 
ries or class marks, it is called a frequency table, no matter how many other 
columns it may have ) From the class frequency column of Table 2 3, wc 
note that grouping produced a smoothing effect and that observations fell 
most offen m the interval 8 55-9 55 Using this table, we are able to present 



SECT. 2.2. FREQUENCY DISTRIBUTIONS I ^ 

the observations graphically in the form of frequency diagrams in several 
standard ways. It should be noted that all the columns of Table 2.3 are not 
required in a single study or in a single diagram. 


Table 2.3 

Frequency Table of Data in Table 2.1 (Grouped) 


Class 

Boundaries 

Class 

Mark 

Xi 

Tabulation 

Class 

Frequency 

fi 

Relative 

Frequency 

filN 

Cumulative 

Frequency 

fi 

Cumulative 

Relative 

Frequency 

(fIN) 

4.55- 5.55 

5.05 

III 

3 

0.062 

3 

0.062 

5.55- 6.55 

6.05 

l+tt 1 

6 

0.125 

9 

0.188 

6.55- 7.55 

7.05 

mt III! 

9 

0.188 

18 

0.375 

7.55- 8.55 

8.05 

4+H nil 

9 

0.188 

27 

0.562 

8.55- 9.55 

9.05 

44+tFm II 

13 

0.271 

40 

0.833 

9.55-10.55 

10.05 

4+l-t- II 

7 

0.146 

47 

0.979 

10.55-11.55 

11.05 

1 

1 

0.021 

48 

1.000 


Perhaps the most useful frequency diagram is the frequency histogram 
(relative frequency histogram if the right-hand vertical axis is used) illustrated 
in Fig. 2.2. Putting the class boundaries and marks on the horizontal axis, 
x-axis, and the frequencies on the vertical axis, /-axis, of a rectangular co- 
ordinate system, we erect a rectangle above each interval whose area is 
proportional (most often equal) to the frequency of that class. Note that 
in our illustration the mark x, of the zth class (i being any integer from 1 



Per cent above 64 yeors of oge 


Fig. 2.2 Frequency Histogram for Data in Table 2.3 

through k, the number of classes) is also the mid-point of the base of the 
ith rectangle making up the histogram, and that / represents the height of 
the Ith rectangle as well as the frequency of the ith class. Hence, the area ^ 
each rectangle is a number which is the same as the frequency for the corre- 
sponding interval, and the total area of the histogram, that is, the sum of 
the areas of the rectangles, is the total frequency. In particular, the totaf^ 
frequency is 48 and the total area is (48)-(l) = 48 square units. If some class 
length had not been equal to the others, it would have been necessary to 


16 


FREQUENCE DBIFfWmONS 


CHAP, i 


adjust the height of its rectangle $o as to express the area of this rectangle 
in units of all other rectangles (Uses and limitations of the historgam will 
be discussed later ) 

'Histograms are normally used for groupccLjiat^Gtouping introd uces 
approximations, and this limitation is reflected m the histogram, which may 
*4)6 used to indicate that each value m an interval appears with the same rcT^ 
ative frequency Thus, in the interval from 5 55 to 6 55 the histogram 
indicates that 5 6, 5 7, ,65 occur with equal relative frequencies This 

IS clearly not the case as we can sec by looking at Table 2 2 This limitation 
of the histogram becomes less troublesome as the class length becomes 
smaller or the number of observations becomes larger 

The /requeue} pal}gan in Fig 2 3 illustrates another method of repre- 
senting frequency diagrams A frequency polygon is constructed by plotting 
the points (jt,./,) (x.,/i), . (jr,.yi) on a rectangular co-ordinate system 

and then connecting these points by straight-iine segments In order to 
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Fig JJ Frequency Polygon for Data in Table 2 J 
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complete the picture it is often desirable to add one class mark with zero 
frequency to each end of the diagram. 

Frequency polygons can be misleading if they are incorrectly used. For 
example, the graph in Fig-. 2.3 is for grouped data. Thus, caution must be 
used in “reading off” frequencies from the graph corresponding to values of 
x which are not class marks. A frequency polygon is used mainj y tpjjidicate 
th e shape of a g iyen_set_of._data. 

There is another type of frequency diagram, called the cumulative fre- 
quency volypnn or nei ve. which is often used. Figure 2.4 illustrates the graph 
of the ogive using data from Table 2.3. The main difference between a cu- 
mulative frequency polygon and the corresponding frequency polygon is 
that of locating the point (x,', /',) corresponding to (.v^, /<) in the frequency 
polygon. The xl co-ordinate is either the upper or lower class boundary, 
depending on whether the cumulative frequency polygon is a “less than” 
or a “more than” type of diagram, and the fl is the cumulative frequency for 
the class containing xj. Figure 2.4 is an ogive which gives the number of 
states with percentages less than the corresponding number which is given 
on the horizontal scale. Thus, in our example, we use the upper boundary 
of a class when we plot the polygon. (Some important uses of the ogive will 
be brought out later.) It should be observed at this time that in_a_piroulative 
frequency polygon, the.hei ght above a p articular po.int._o. n the x-axis is th e 
__same numerical value as the area in the correspon ding histogram to theJ eft 
QC..that_pqint._For example,’“ ther Height of the cumulative polygon above 
X = 8.55 in Fig. 2.4 is 27, and the area to the left of x = 8.55 in Fig. 2.2 
is also 27. 

Figures 2.2, 2.3, and 2.4 illustrate methods of “picturing” distributions 
in terms of frequencies when the left-hand vertical axis is used. However, 
each of these graphs could have been drawn by using relative frequencies 
or proportions in place of frequencies, as is indicated by the right-hand 
vertical axis in Fig. 2.2 and 2.4. This would not change the shape of the dia- 
grams, but the area of the histogram and the area under the polygons would 
be reduced in the ratio 1 : 48. In general, the areas would be divided hy^^N, 
where N denotes the total frequency. Again, it would be quite natural for us 
to call these diagrams distributions. In order to distinguish the diagrams, 
we might use the terms frequency histogram and relative frequency histo- 
gram, frequency polygon and relative frequency polygon, and cumulative 
frequency polygon and cumulative relative frequency polygon. 

The points (vertices) of the polygons may be connected by smooth curves 
in some cases. We then call tliem frequency and cumulative frequency curves 
when frequencies are used, and relative frequency and cumulative relative 
frequency curves when relative frequencies are used. We shall see later that 
the word “density” is associated with one class of graphs and the word 
“distribution” is associated with another. For this reason we avoided using 
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the term “distHbulion" in connection with any of the graphs mentioned 
abo%e 

We now consider t»o practical uses of the ocise From this graph the 
inicstigator mav determine the number or relatise frequency of observa 
tions less than a gi\en value Convetselv he may determine the value below 
which the observations fall with approiimajcly a given relative frequency 
Such values arc called quaniitfs or fraelif^s Either the frequency scale or 
the relative frequency scale may be useful Generally the relative frequency 
scale IS employed because the jnvesiiEitor is interested in comparing tvio 
sets of observations or in comparing one set of observations with a standard 

Special quantiles such as percentiles deciles and quartiles are normally 
used Those xvalues of the observationscorrtsponding to relative frequencies 
of 0 01 0 02 0 99 or percentaees of 1 2 , 99 are called first per 

cenlile second perfenitle mneiy-ninih percentile and are denoted fay 
P P P,t those X values corresponding to percentages of 10 20 

90 are coiled first decile second decile ninth decile and are denoted 

by /) 0 0, and those x values corresponding to percentages of 

25 50 25areca!led/w^j«?r«/e seeondquande third ipwruleoTidm itnoXtd 
hy Q Q Q Oeatly the fiftieth percentile /*», the fifth decile D, and the 
second quartile Q represent the same x value This particular x value is 
called the median of the distribution In order to find P ^ ss D, $ay from 
Fig 2 4 wc draw a line parallel to the x arts passing through the 0 10 point 
of the right hand vertical axis and a point on the cumulative polygon 
From the point of intersection on the polygon we drop a perpendicular to 
the X axis The point where the perpendicular cuts the x axis is the tenth 
percentile value and is P , = 5 85 That is. 5 85 is the value (approximate) 
below which ten per «ni of the observations fall In order to find the per 
cemtle rank graphically for any observation P, we reverse the steps given 
above and find the percentage point k (relative frequency A/lOO) on the 
right hand vertical scale Forexample ifP, 800 il follows that jl -045 
CIcarl) the cumulative polygon may be used to find the percentage (ap- 
proximate) of the states with t values between two values t, and X| 'Vc 
must keep in mmd that graphic methods introduce approximation 

The rabies and diagrams presented so far are obviously for finite col 
lections of data The data may represent all the observations m a finite 
population or only a finite sample from an infinite population or larger 
mite "population In either case the methods presented may be thoueht of 
IS describing or p ctunng the set of observations 

the population is infinite in sae an investigator will never have all 
observations Thus hemuslalwaysworkwnhasamplewhich atb-st cannot 
indicate the true nature of the population m all details Since the sample 
gives only an approximation to the population he usually introduces some 
theoretical d sinbuiion that is, mathematical model which is thoueht to 
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describe the population of possible or hypothetical observations. The statis- 
tician studies many such theoretical distributions which are thought to 
represent the true distribution of observations. The normal distribution, also 
called Gaussian distribution, is the most important continuous theoretical 
distribution and will be discussed in detail later along with others. 

At this time we consider an illustration of a sample of observations taken 
from a population which may he thought of as an infinite population. 
Table 2.4 gives the acidity level expressed in pH units of 30,268 batches of 
Virginia soil tested in 1955. (The population is the set of pH values of 
all batches of soil in Virginia.) The maximum frequency is for the pH 
range 6.2~6.3 (class boundaries are 6.15 and 6.35); the distribution is 
smooth, and drops more rapidly for small pH values than for large values. 
Note that the first class has no fixed lower boundary and that the last class 
has no fixed upper boundary. Also, note that the class boundaries are not 


Table 2.4 

Acidity Level of Batches of Virginia Soil Tested ’n 1955* 


pH Range 

Frequency 

pH Range 

Frequency 

Below 5.0 

551 

6.2-6.3 

3548 

5.0-5. 1 

1015 

6.4-6.5 

3055 

5.2-5.3 

2072 

6.6-6.7 

2497 

5.4-5.5 

2852 

6.8-6.9 

1755 

5.6-5.7 

3147 

7.0-7. 1 

1222 

5.8-5.9 

3362 

l.l-l.-i 

915 

6.0-6. 1 

3521 

Above 7.3 

756 



Total 

30,268 


* Courtesy of Virginia Polytechnic Institute Soil Testing Laboratory. 



pH values 


Fig. 2.5 Histogram of Data in Table 2.4 
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given and that the limits of the actual observed values are indicated in the 
table in place of boundaries 

The histogram in Fig 2 5 contains more intervals than the one in Fig 
2 2 and has a smoother appearance The first and last intervals are made 
the same length as the others and it is assumed that none of the batches 
had pH levels above 7 55 or below 4 75 We shall use Fig 2 5 later in con 
necKon wjth a discussion on the theoretical distribution 

2 2 2 Discrete Vonobfe 

Many times the investigator obtains observations by counting such 
things as the number of petals on a daisy the number of successes in n tosses 
ofacoin orlhenumberofcarspassingacertainpomteachday Observations 
of this type are referred to as ecunled data or enumeration data At this time 
we wish to consider methods of describing or picturing such data, and at 
the same time to compare these methods with those used m handling a 
continuous variable This can best be done with the use of an illustration 

The data in Table 2 5 taken from Ceneties Vol 38, shows the frequency 
distribution of the number of bnstles on the sixth abdominal stemite of 
female fruit flies (Drosophila pseudoobscura) The distribution is fairly reg 
ular, with a maximum frequency at 19 and near maximum frequency at 20 
There is a steady decrease in frequency as the number of bnstles increases 
or decreases from 19 and 20 

Table IS 

Bristles on the Sixth Abtlomiiial Stetnue of Female Fruit Flies* 
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The frequency distribution of Table 2.5 is represented graphically in 
Fig. 2.6, in which the number of bristles is the abscissa and the correspond- 
ing frequency is the ordinate. This figure clearly shows that the variable 
takes only isolated values. For some purposes, however, rectangles are 
drawn with unit base and height equal to the frequency. In this case, the 
number of bristles is the mid-value of the base of each rectangle. This gives 
a histogram which has the appearance of those in Fig. 2.2 and 2.5, but it 
should be kept in mind that the variable is not continuous when a histogram 
is used for a discrete variable. 

The cumulative frequency polygon for a discrete variable shown in 



Fig. 2.6 Frequency Distribution of Data in Table 2.5 



Fig. 2.7 Cumulative Frequency Polygon for Data in Table 2.5 
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Fig 2 7 IS different m nature from the one shown in Fig 2 4 We might term 
the graph jn Fig 2 7a step polygon, where the height of a step indicates 
the frequency corresponding to a gnen number of bristles The ordinate 
represents the total number of flies with number of bristles equal to or less 
than the corresponding abscissa value We could represent the cumulative 
frequency polygon by isolated ordinates above each abscissa value The 
same applications can be made to both representations From this graph it 
IS possible to obtain percentiles and percentile ranks, as we did m the con- 
tinuous case However it should be noted that corresponding to certain 
relative frequencies (percentile ranks) there is a whole interval of values which 
could be used as percentiles For example. Ptt may be any value between 17 
and IS, including 17 

Just as in the continuous case the set of data in a discrete experiment 
may represent all the observations or only a sample If all the observations 
are known the methods just described represent an accurate (in so far as 
the particular method is capable) picture of the distribution If the observa- 
tions in an experiment represent only a sample from either a finite or infinite 
population, then the methods just described give, at best, a good approxima- 
tion to the true distribution As in the continuous case this leads to con- 
sideration of discrete theoretical distributions which are thought to represent 
true distributions of observations or populations The binomial distribution 
and the Poisson distribution are perhaps ihc most useful discrete theoretical 
distributions— the binomial being finite and the Poisson infinite (These 
distributions, along with certain others, will be discussed in detail later) 

The frequency distributions of the three illustrative examples as indicated 
by Fig 2 2 2 S. and 2 6 are roughlyM/shuped ihat is. they are symmetrical, 
with most of the values falling somewhere near the middle of the range of 
values Distributions of this type are the most common in practice, but many 
important variables have frequency distributions of a very different form 
For example, if the variable is wealth of an American household or number 
of vehicles passing a certain point on a highway per ten second period, the 
frequency distribution looks like Ihc right half of a bell-shaped distribution 
A distribution of this type is said to be rJtened to the right, a skewed distribu- 
tion being one which lacks symmetry with respect to a vertical axis Some 
variables are uniformly or approximately uniformly disinbuied. that is, each 
value of the variable occurs with equal or nearly equal frequency For 
example, the frequency of occurrence of each number on the face of an 
honest die in 6000 tosses would be approximately 1000 Some distributions 
have maximum points at each end of the range of values and a minimum 
puna iftirr ithr miablfe ^uefi distributions are known as [/shaped distribu- 
tions Pearse [8] gives an example of an U-shaped distribution, showing the 
frequencies of estimated intensities of cloudiness at Greenwich during the 
years 1890-1904 (excluding 1901) for the month of July (Further examples 
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of the above types of distributions, along with other types of distributions, 
may be found in the exercises in Sect. 2.2.3.) 


2.2.3. Exercises 

Since many exercises in other sections refer to the data in these, these 
should be carefully prepared and saved for later use. 

2.1. Give three examples of a bell-shaped distribution. 

2.2. Give two examples of a uniform distribution. 

2.3. Give two examples of a distribution skewed to the right. 

2.4. Describe and illustrate a distribution which is not bell-shaped or uni- 
form or skewed to the right. 

2.5. Which of the distributions in Exercises 2.1 through 2.4 are for a con- 
tinuous variable? A discontinuous variable? 

2.6. Which of the distributions in Exercises 2.1 through 2.4 represents a 
population? A sample? 

2.7. Draw a frequency histogram and a frequency polygon of the distribu- 
tion in Table 2.6 of excess yardage of 100 denier acetate yarn over 
the specified minimum (99,000 yards per bobbin). 


Table 2.6 


Excess Yardage 
in Hundreds of 

Yards 

Number 

of 

Bobbins 

Excess Yardage 
in Hundreds of 

Yards 

Number 

of 

Bobbins 

0-1 

2 

16-17 

19 

1-2 

0 

17-18 

47 

2-3 

0 

18-19 

57 

3^ 

0 

19-20 

48 

4-5 

0 

20-21 

42 

5-6 

3 

21-22 

18 

6-7 

0 

22-23 

29 

7-8 

0 

23-24 

24 

8-9 

8 

24-25 

10 

9-10 

8 

25-26 

2 

10-11 

7 

26-27 

2 

11-12 

13 

27-28 

1 

12-13 

26 

28-29 

3 

13-14 

31 

29-30 

1 

14-15 

26 

30-31 

3 

15-16 

21 

Total 

451 


2.8. Draw a frequency histogram and cumulative relative frequency polygon 
of the following distribution of time required to reach target reaction 
temperature for hydrolysis of cellulose triacetate; 
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T/me in Sfcends 
Required 10 Reach 
Temperainre 

of 

Oisenelioiu 

Tune III Seconds 
Required to Reach 
Jeiftpera/Mv 

Sumber 

of 

Observalionq 


2 

SS-S9 

30 


32 

60-64 



fse 

65-69 



936 

70-74 


30-34 

2314 

75-79 


35-39 

J75T 

80-84 


40-M 

678 

85-89 


45-49 

234 

90-04 


<0-54 

66 

9S~99 



From Jhe cumulative polygon, determine graph;caIJy what proportion of 
the samples reach reaction tempmlorc in less than 42 sec, between 20 
sec and 4S sec 

2 9 The defiree (arifhmetic mean of 24 recordings in tenths) to which efernds 
covered the sVy during 1957 at the weather station m RoaitoWe, Vitfinta 
are given in Table 2 8 


T«Me28 


Sky CoKf/rom 

Midniyhl fo 
hUdniqhl (tenths) 

Number of 

Days 

0 

IS 

1 

19 

2 

25 

3 

23 

4 

28 

5 

41 

6 

37 

7 

42 

8 

4) 

9 

39 

\Q 

5S 


(a) Draw a relative frequency histogram and cumulanve polygon (b) 
Determine the three quartiles and then half the difference between the 
first and third quartilc F« what proportiott of the days was the shy 
completely covered with clouds’ Without a trace of clouds’ 

2 10 Draw a frequency and relative frequency histogram On the same graph 
of the distribution of 1088 teak trees by three inch diameter classes 
listed in Table 29* 
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Table 2.9 


Diameter of 

Trees (inches) 

Frequency 

4.5- 7.5 

8 

7.5-10.5 

26 

10.5-13.5 

50 

13.5-16.5 

120 

16.5-19.5 

181 

19.5-22.5 

215 

22.5-25.5 

213 

25.5-28.5 

145 

28.5-31.5 

76 

31.5-34.5 

36 

34.5-37.5 

18 


Total 1088 


* A. L. Griffith and Bakshi Sant Ram, “The Silvicultural Research Code,” Vol. 2, The 
Siatistical Manual, Office of the Geodetic Branch, Survey of India, Dchra Dun, India, viii, 
214 pp., 1947. 

2.11. (a) Draw a frequency and relative frequency histogram on the same 
graph of the distribution of precipitation each day at the weather 
station in Roanoke, Virginia, as shown in Table 2.10. 

Table 2.10 


Precipitation Number of 

in Inches Days 


Trace 

48 

0.01-0.09 

55 

0.10-0.19 

22 

0.20-0.29 

12 

0.30-0.39 

11 

0.40-0.49 

5 

0.50-0.59 

7 

0.60-0.69 

3 

0.70-0.79 

7 

0.80-0.89 

2 

0.90-0.99 

2 

1 .00 and over 

6 


(b) Use the histogram in writing a short summary (around 100 words) of 
the record of rainfall at Roanoke in 1957. 

2.12. The verbal scores of the 1958 Graduate Record Examination of 112 college 
seniors are as follows: 
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480 570 590 590 480 480 440 
500 500 480 570 620 440 390 
450 400 500 500 610 420 
550 390 500 470 530 540 610 
470 400 520 520 480 560 630 
480 390 640 420 €TO 560 650 
490 620 520 660 420 550 370 
530 340 480 380 440 480 480 
530 350 480 480 430 660 430 
530 470 570 480 530 470 400 
510 480 480 640 590 460 370 
460 480 420 380 440 610 500 
460 540 370 530 630 510 5S0 
500 480 530 480 430 640 520 
570 360 600 570 510 500 450 
560 510 540 370 420 660 


Use these data to construct a frequency table and a dot frequency diagram 

2 13 (a) Use the data in Cxercise 2 12 to construct a frequency table with nme 
appropriately chosen classes of equal length (b) Draw the frequency 
hatogrsm and cttfnu}atn'ere}3tnef/wtveiKY poiygon (e) Detemma what 
proportion of the students had scores less than 40o greater than 600 
Find the median 

2 14 The body weights in grams of 62 male bobwhiie quail trapped by Dr 
Vince Schultz in Ohio m tM6-47 and >947-4$ are as follous 


2100 1983 2303 1985 2020 

203 8 198 5 173 6 198 5 194 9 

1894 198 3 180 7 184 3 187 8 

196 4 174 7 183 3 184 3 233 9 

1796 1770 1925 1914 2020 

193 1 177 0 201 3 191 4 212 6 

2179 1985 1766 1949 198 5 

199 3 181 6 174 3 m2 212 6 

2178 1985 1634 1985 2339 

215 6 194 8 199 7 177 2 184 3 

2202 184 3 1918 184 3 

1814 2162 1665 1949 

J985 2126 1985 2020 

(a) Construct a frequency table with appropriately chosen classes (discuss 
with instructor) (b) Draw the histogram and the cumulative polygon 
for these data (c) Use (a) and (b) to write a short summary of the weight 
distribution of these 62 quails 

215 Table 2 11 gives the maximum and minimum temperature m degrees 
Fahrenheit for each day of (he summer of 1957 (starting with June 21 
and ending with September 23J at the weather station in Roanoke 
Virginia 
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Table 2.11 


Temperature (°F) 

Maximum Minimum 

Temperature (°F) 
Maximum Minimum 

Temperature (°F) 
Maximum Minimum 

83 

64 

92 

70 

78 

53 

76 

58 

83 

63 

70 

58 

68 

56 

83 

56 

82 

62 

75 

59 

82 

57 

87 

58 

87 

57 

84 

63 

86 

60 

88 

62 

84 

63 

89 

61 

78 

54 

88 

63 

96 

66 

81 

55 

92 

63 

95 

65 

87 

54 

94 

67 

93 

64 

92 

58 

89 

67 

92 

65 

82 

67 

94 

62 

89 

66 

84 

57 

95 

63 

88 

70 

90 

55 

86 

71 

87 

62 

93 

64 

83 

62 

72 

66 

91 

71 

81 

53 

77 

63 

87 

59 

85 

53 

70 

63 

92 

62 

90 

57 

69 

63 

94 

63 

91 

62 

81 

65 

85 

69 

89 

70 

85 

62 

85 

63 

88 

68 

88 

69 

88 

54 

93 

66 

91 

64 

93 

63 

85 

67 

85 

67 

93 

65 

85 

68 

83 

63 

95 

72 

85 

71 

77 

65 

94 

67 

91 

72 

68 

64 

89 

70 

77 

63 

66 

60 

80 

70 

65 

61 

66 

60 

75 

68 

68 

62 

81 

64 

88 

66 

80 

62 

87 

66 

95 

61 

83 

56 

88 

65 

98 

67 

85 

57 

73 

53 

97 

71 

80 

60 




(a) Construct frequency tables for the maximum and minimum temper- 
atures, letting the length of each class be five units, (b) Draw the cumulative 
relative frequency polygons on the same graph in order to compare the 
distributions of maximum and minimum temperatures. 

2.16. (a) Determine the difference in the maximum and minimum temperature 
for each day in Exercise 2.15 and construct a frequency table of these 
differences, (b) Write a short summary of the distribution of ranges of 
temperatures for the data in Exercise 2.15. 

2.17. Table 2.12 gives by states the personal income and the state debt per 
capita in 1955 (based on the estimated population on July 1, 1954, ex- 
cluding armed forces overseas) and the expenditure per pupil in average 
daily attendance for public elementary and secondary day school in 1954. 
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State* 

Ptr Capua 
Debt income 

Expen^tart 
per fapti 

State 

fer Capita Expenditure 
Debt income per fupU 

Alabama 

% 

S118I 

5151 

Nebraska 

% 2 49 

$1540 

$262 

Aritona 

4 76 

1577 

282 

Nevada 

6 60 

2434 

294 

Arkansas 

65 84 

1062 

139 

New Hampshire 

76 86 

1732 

256 

California 

6S82 

2271 

345 

New- Jersey 

16167 

2311 

333 

Colorado 

15 66 

176( 

280 

New Mexico 

36 56 

1430 

265 

Connecticut 

)65 33 

2499 

297 

New York 

96 99 

2263 

362 

DelaviSte 

344 92 

2313 

325 

North Carolina 

70 29 

1236 

177 

Florida 

25 99 

1654 

229 

North Dakota 

34 41 

1372 

262 

Georgia 

64 56 

1333 

177 

Ohio 

37(35 

2062 

254 

Idaho 

3 92 

1462 

238 

Okbhoma 

89 40 

1506 

224 

Illinois 

33 26 

2257 

319 

Oregon 

108 40 

1834 

337 

Indiana 

75 71 

1894 

280 

Pennsylvania 

109 SS 

1902 

299 

Iowa 

1092 

im 

274 

Rhode Island 

77 56 

1957 

268 

Kansas 

85 60 

1647 

264 

South Carolina 

91 84 

1108 

176 

Kentucky 

2316 

(238 

153 

South Dakota 

29 

1245 

275 

Louisiana 

79 58 

1333 

247 

Tennessee 

34 53 

1256 

166 

Maine 

13150 

1593 

m 

Texas 

>643 

1614 

249 

Maryland 

177 47 

1991 

268 

Utah 

5 92 

1553 

208 

Massachusetts 167 39 

2097 

298 

Vermont 

1998 

JS35 

245 

Michigan 

73 75 

2134 

283 

Virginia 

31 36 

1535 

193 

Minnesota 

26 92 

1691 

287 

Washington 

9201 

1987 

305 

Mississippi 

42 79 

946 

123 

West Virginia 

141 16 

1288 

186 

Missouri 

269 

1800 

233 

Wisconsin 

1 30 

1774 

293 

Montana 

70 09 

1844 

328 

Wyoming 

12 70 

1753 

330 


• Bureau of (h« Census, Statnneol Absuotteftht U S , I9S7 

(a) Croup ihe data and construct a frequency (abfe for the per capita 
debt and draw the frequency histogram (b) Group the data and con- 
struct a frequency table for the per capita income and draw the frequency 
histogram (c) Group the data and construct a frequency table for the 
expenditure per pupil and draw the frequency histogram 

2 18 (a) Draw a cumulative polygon for the per capita debt of Exercise 2 J7 
and write a short summary of the outstanding features of the distribution 

(b) Draw a cumulative polygon for the per capita income of Exercise 
2 17 and write a short summary of the outstanding features of the dis- 
tribution (c) Draw a ciinnilative polygon for the expenditure per pupil 
of Exercise 2 17 and write a short summary of the outstanding features 
of the distribution 

2 19. Toss simultaneously five smulat coins 50 times, recording the number of 
heads resulting front each toss Draw a graph which pictures this in- 
formation 

2.20 (a) Combine the results from Exercise 2 19 of all the tosses of all the 
students in the class and draw a ^ph which pictures the information 
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(b) What proportion of the tosses resulted in the appearance of five heads ? 
Zero heads? Two heads? Three heads? Do these proportions seem 
reasonable? See if you can compute mathematically the proportion of 
times five heads would occur if all coins are fair. 

2.21. Count the number of letters in each word of the first three paragraphs 
of Sect. 2.2.1 and construct a frequency table and frequency graph 
showing this information. 

2.22. Count the number of steps you take in walking by the most direct route 
(keeping off the grass) from the main door of the campus post office to 
the main door of the administration building. Obtain this same infor- 
mation from 29 other students and construct a table and graph of your 
joint findings. 

2.3. PARAMETERS 

The tabular and graphic methods described in Sect. 2.2 give very little 
accurate quantitative information about distributions. For example, we 
might see that two histograms differ, but find it impossible to describe how 
much they differ. Thus, some sort of arithmetical description is desirable. 
For certain distributions found in practice, particularly the bell-shaped and 
uniform, such a description can be satisfactorily brought about by means 
of two parameters — one which measures central tendency and the other, 
dispersion. For other distributions the two additional descriptive measures 
of symmetry and peakedness are also important. Measures of symmetry 
indicate to what degree the data are balanced about some central value, and 
measures of peakedness indicate how flat or peaked the distribution may be. 

In this section we think only of measures which characterize finite popu- 
lations and samples which are not to be used for purposes of statistical inference. 
We consider measures which characterize infinite populations and samples 
used for statistical inference in later sections. 

2.3.1. Measures of Central Tendency 

Observations have a tendency to cluster at some particular location on 
the scale of measurement. We shall now consider some parameters which 
measure central tendency of a set of observations, also called measures of 
location, and at the same time introduce some notation which will be stand- 
ard throughout the book. 

The most important measure of location, both practically and theoreti- 
cally, is the arithmetic mean, or mean, as it is usually called. It is defined as 
the sum of all the variable values divided by the number of such values. 
If we denote the mean by p (lower-case Greek letter rau) and N observations 
have values denoted by Xi, Xj, . . . , x^, then, in symbols, the mean is defined 
by 
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■X. -f Jf« 4- • • X„ 

7f 


The mean can also be written as 



where the 2 (capital Greek letter sigma) is the usual symbol for the sum of 
the values indicated, that is 

2 X, = X, + + + Xk 

For grouped data the mean may be approximated by using for computation 
the formula 

fh * — (2 2 ) 

where/, is the frequency and x, the mark of the ith class, k is the number of 
classes, and 

N='Xfi 

IS the total number of observations If the number of observations is large 
and the class lengths are sufTicienlly small the approximation to the mean 
H given by Eq (2 2) is quite good The saving in lime of computation is con- 
siderable for a large set of data, especially if a frequency table is to be pre- 
pared anyway If the observations are not grouped but a frequency table is 
prepared, as m Table 2 5, Formula (2 2) gives the exact value of the mean n 
The data in Sect 221 relating to the per«ntage ofhumans 65 years old 
and over in the United States in 1950 will be used to illustrate the applica- 
tions of Formulas (2 1) and (2 2) Leltmg x, denote the percentage in Ala- 
bama, Xi the percentage in Arizona, . x,» the percentage in Wyoming 
and using the percentages from Table 2 I in Formula (2 1) gives 

« = 65 + 59 + -h63 _ 387 2 

^ 48 48 

or 

ft ~ 8 067 per cent 

Formula (2 2) may be applied to the arranged data m Table 2 2 Letting 
Xi = 4 9, X, = 5 4, , X| = 6 5, . x„ = 8 7, , x„ = 10 8 so that 
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fi — — 1, . . ■ »/& — 2, . 

ing in Formula (2.2) 


- A,... = 1, we have on substitut- 


II = [1(4.9) + 1(5.4) + • • • + 2(6.5) + • • • + 4(8.7) + • • • + l(10.8)]/48 


387.2 

48 


= 8.067 per cent 

Clearly, these two procedures lead to the same correct (except for rounding- 
off errors) arithmetic mean of the population of 48 percentages. Applying 
Formula (2.2) to the grouped data in Table 2.3 gives a slightly different 
value. Let x, = 5.05, x, = 6.05, . . . , x, = 11.05; then/, = 3,/ = 6, . . . , 
/ = 1, and by Formula (2.2) 

_ 3(5.05) 4- 6(6.05) + • • • + l(lf.05) 

1^0 - 48 

_ 386.4 
48 


or 

= 8.050 per cent 

This example involving percentages was used only to illustrate the appli- 
cations of Formulas (2.1) and (2.2). It is not necessarily intended that this is 
the most suitable measure of central tendency for this example. The nature 
of a distribution and the nature of the unit of measure of the variable, as we 
shall see later, determine to a large extent the type of descriptive measure 
which is most appropriate. 

Geometrically, the mean is that point on the x-axis where a uniform and 
vertical sheet of metal in the shape of a histogram would balance on a pivot. 
Thus, we may think of the arithmetic mean as being located at the centroid 
or center of gravity of a distribution. i 

At this time, we give definitions of other measures of central tendency, 
for the most part leaving the treatment of their application and the discus- 
sion of some of the advantages and disadvantages of each for a more ap- 
propriate place. The three types of measures of location commonly used are 
the median, the mode, and the means (geometric and harmonic as well as 
arithmetic). 

Arrange the N values of the variable x in increasing (or decreasing) 
order of magnitude and denote them by x,,,, X( 2 „ . . . , Xja-,. Let denote 
the median of the N values. If // is odd, the median is the middle value; if 
N is even, it is taken as the mid-point of the middle pair of values. We have 
already learned that the median is the same as the second quartile, the fifth 
decile, or the fiftieth percentile. If the data are grouped, the median can 
be thought of geometrically as that point on the x-axis which has equal 
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areas under the histogram on both sides of a vertical line through the point 
The median is important m certain situations and ranks second to the mean 
in usefulness For example, the median income is a more meaningful meas- 
ure of income in the state of Delaware, say, than the arithmetic mean This 
IS so since a few very large incomes would not affect the typical measure of 
income, namely the median, but would increase the mean to the extent 
that It would be worthless as a typical measure of income 

The mode, M, is that value of jc in a collection which occurs with maxi- 
mum frequency If there are q values of x which have the same maximum 
frequency, then there are q modes 

The geometric mean, G, is the Afth root of the product of the N values 
Xi, x„ , X],, that IS 


G — Sir 



where (capital Greek letter pi) denotes the product of N values Note 
that the logarithm of the geometric mean is the arithmetic mean of the 
logarithms of the observations The geometric mean is not used if any of 
the variable values are negative or zero It is used chiefly in averaging rates 
or ratios rather than quantities, and this has rather restricted application 
The harmonic mean, H, of N values x„ x„ , (all different from 
zero) IS the reciprocal of the arithmetic mean of their reciprocals, that is 



This measures the average time rates and is also very limited m its applica- 
tion 

Among other measures of location there is the mid-range which is the 
mid-value of the largest and smallest values of N, that is 

mid range = ~ *** (2 5) 

where x,,, and x,^, denote the smallest and largest values, respectively 
Experiments will be described later (Exercises 2 3 4) in which each of 
these crteesaces cfi'acdtian a considbrea' most appropriate They are o'ciined' 
at this time to show that there is more than one measure of location, and to 
indicate that the experimenter does not automatically select the arithmetic 
mean when considering central tendency — be must have some basis for 
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selecting a measure of location. However, the arithmetic mean is the only 
measure of central tendency which is readily amenable to theoretical treat- 
ment, and it is, as we shall see later, in a certain sense the “best measure of 
location” of some of the most widely used distributions. 

2.3.2. Measures of Dispersion 

We are likely to wonder how typical a particular measure of location 
really is, that is, how many variable values are near this single measure. 
Intuitively, we feel the answer must depend on the degree of cluster about 
the location parameter. Further, in order to compare two or more distribu- 
tions with the same numerical value of some particular location parameter, 
we naturally turn to some numerical measure of the amount of scatter or 
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dispersion in the distributions In particular, a buyer of metal bolts of a given 
diameter would rather deal with that manufacturer making the rnost uniform 
bolts, that is, bolts with diameters having the least amount of scatter In 
order to be more concrete and to have material for illustration, we shall 
consider the five histograms in Fig 28, which have means located at 4 
or near 4, but which obviously have different amounts of scatter 

If we assume that a measure of dispersion is desirable, the question is 
how do we determine if* In order to answer this question as well as certain 
others, we set about defining and illustrating some of the most common 
measures, namely, range, mean deviation, variance, standard deviation, and 
coefficient of variation It should be noted that all these measures would 
not be calculated m any given experiment Generally, only one would "be 
selected, that one being the measure which seems most appropriate for the 
particular experiment The vanance and standard deviation are of greatest 
importance both theoretically and practically, and are the measures which 
we naturally turn to first in a given experimental situation 
The range, R is given by 

R = x.. -x... (26) 

where x, ^ , and x , , are the largest and smallest values of N values of the 
variable, respectively It is seldom used as a descriptive parameter of a 
population, since it indicates very little about the way the distribution 
appears tnside the interval of values For example, the range of each dis- 
tribution in Fig 2 8 IS 7, but the distributions are obviously quite different 
For a very small number of values the range might be considered satisfactory 
However, this measure is mainly attractive ^ause it is computationally 
convenient and because of the very small size of repeattve samples obtained 
in certain industrial procedures 

The amount of scatter is clearly dependent on how much the set of values 
deviates from some central value The greater the scatter, the larger the total 
deviation Since the total deviation depends on the number of values as well 
as the amount of scatter, we think m terras of “average deviation” to avoid 
the difficulty of number of values 

The mean deviation, 5, is defined to be the arithmetic mean of the ab- 
solute value of the deviations of the observations from the median, that is 

( 27 ) 

where denotes the median and |x, — is always positive Until re- 
cently the mean deviation wa^almost always defined in terms of deviations 
from the mean u, but deviations from the median are computationally and 
theoretically easier to use Whenever the nuan and median are the same, 
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the mean deviation about the mean would be the same as Sm ; otherwise, they 
would differ. In any case, the mean deviation is very difficult, if not impos- 
sible, to use in certain important theoretical work and hence is not as useful 
as it might seem at first — the computation of absolute values limits its 
use in theory. 

Since the absolute value causes trouble, it is natural that we try to avoid 
its use. For this and other reasons we define the variance which is computa- 
tionally more' difficult, but theoretically quite, satisfactory. 

The variance, a-^ (lower-case Greek letter sigma squared), is defined to be 
the arithmetic mean of the squares of the deviations from the mean p,; that 
is 


2 (Xi - iiY 

( 2 . 8 ) 

The standard deviation is the square root of the variance and is denoted by cr. 
This gives a measure of dispersion which is expressed in terms of the unit of 
measure of the variable values. For grouped data the variance may be ap- 
proximated by using the formula 


- m)' 

<^1 = / ( 2 - 9 ) 

where/, is the frequency, x, is the n ark of the ith class, k is the number of 
classes, and N the total number of observations. If the observations are not 
grouped but a frequency table is prepared as in Table 2.5, Formula (2.9) 
gives the exact value of the variance or. 

To illustrate the applications of Formulas (2.8) and (2.9), we use the 
data relating to the percentage of humans over 64 years old in the United 
States in 1950. Referring to Table 2.1 and using the notation of Sect. 2.3.1 as 
well as the value ix — 8.067 per cent, we have, from Formula (2.8) 

^2 (6.5 - 8.067)^ + (5.9 - 8.067)^ + • • • + (6.3 - 8.067)° 

48 

or 

<r' = 2.13 square per cent 

If we use Table 2.2 and the notation of Sect. 2.3.1, Formula (2.9) may be 
applied to obtain 

= [1(4.9 - 8.067)= + 1(5.4 - 8.067)= + • • • 2(6.5 - 8.067)= 

+ • • • +4(8.7 - 8.067)= +■•.+' 1(10.8 - 8.067)=]/48 
or 
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= 2 13 square per cent 

Applying Formula (2 9) to the grouped data in Table 2 3 and using the nota- 
tion of Sect 2 3 I, we have 

. 3(5 05 - 8 0 67)«-t-6(6OS-8O67)»+ + l(n 05 -8067r 

07= "48 

or 

= 2 29 square per cent 

The standard deviation is 

a = V 2 T 3 = I 46 per cent 

Form ffj we find that the standard deviation is approximately a-, I 51 per 
cent It IS clear that “per cenr is a more meaningful measure than “square 
per cent" used for a' 

The coefficUni of vamlion. r. of a set of observations is simply the 
standard deviation divided by the arithmetic mean, that is, 



This parameter is used mainly to bring out the degree of spread of the obser- 
vations in terms of the mean it is useful m comparing two distributions 
with widely differing means (see Exercise 243) For the above example 

146 

'’“?067 

or 

rsOISI 

which IS a dimensionless measure 

Note Often v is defined as the standard deviation expressed as a percent- 
age of the arithmetic mean that is, r = 100<r/n per cent 

The relative merits of these measures of dispersion will be discussed more 
fully later They were introduced at this time to show that scatter can be 
measured in several meaningful ways, and to indicate that the investigator 
does not automatically select the standard deviation say. when considering 
scatter — he must have some basis for selecting a measure of dispiersion 
Perhaps il ihnuld be jroenXuuied Jihai Jjw! xofau arid jv 

deviation, the median and mean deviation, and the mid-range and range are 
used together as a general rule Furthermore, the mean and variance are 
used almost altogether for the large body of distributions which are roughly 
bell shaped 
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2.3.3. Use of Meon and Standard Deviation in Bell-Shaped Distributions 

The standard deviation may be used to describe the frequency with which 
observations fall near the mean or in an interval about the mean. In particu- 
lar, if the distribution is bell-shaped and the number of observations in the 
population is reasonably large, then approximately | of the observations 
fall in the interval from /x - o- to /i, -b <r, of the observations fall in the 
interval from p, - 2/i. to p, -b 2o-, and 997/1000 of the observations fall in 
the interval from p, - 3cr to p -b 3cr. This property is easily illustrated by 
using the data of Tables 2.2 and 2.5 

Example 2.1. Find the proportion of observations in Table 2.2 falling 
in the three intervals defined above. 

We know that p = 8.07 and <t = 1.46. Thus, the interval from p — cr to 
p -b o- becomes 6.61-9.53. By actual count 30 observations, or of the 
observations, fall in this interval; that is, the relative frequency of observa- 
tions in this interval is 0.625. Also, 47 of 48 observations fall in the interval 
from p — 2o- = 5.15 to p -b 2<r = 10.99, and 48 observations fall in the 
interval from p — 3cr = 3.69 to p -b 30- = 12.46. That is, the relative fre- 
quencies of observations in these intervals are 0.979 and 1.000. Even though 
the size of the population is not very large, the proportion of observations 
falling in the intervals is fairly close to those (0.667, 0.950, 0.997) given above. 

For the grouped data, p^ = 8.05 and a-^ = 1.51. Thus, the intervals are 
6.54-9.56, 5.03-11.07 and 3.52-12.58 with the relative frequencies of ob- 
servations falling in these intervals being 0.646, 0.958, and 1.000, respectively. 

For either the grouped or the ungrouped data we can see that roughly 
4 of the states have percentages falling within one standard deviation, 1 .5, 
of the mean, 8. 1 . Thus, without counting we might have felt reasonably sure 
that approximately 32 states have percentages from 6.6 to 9.6. For large 
masses of observations this use of the mean and standard deviation furnishes 
an easy way to describe numerically the nature of the observations, and also 
saves time in determining the proportion of observations between two 
boundaries. 

The fruit fly data found in Table 2.5 may be used to illustrate the above 
methods for a large population. It is found in Sect. 2.4 that p = 19.21 and 
o- = 2.11. Thus, we would expect roughly f of the observations or 539 
observations to fall in the interval from 17.10-21.32, that is, that roughly 
539 female fruit flies have 18, 19, 20, or 21 bristles on the sixth abdominal 
sternite. By counting, we find 508 observations or 62.9 per cent of the obser- 
vations with 18, 19, 20, or 21 bristles. The estimated number of observations, 
539, differs from the actual number, 508, more than we‘ might wish. This 
difficulty can generally be overcome by using a histogram and thinking 
momentarily of the number of bristles possible as being uniformly distributed 
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over the continuum ofvalues for each class rn this case the actual number 

would be 533 2. which is fairly close to the estimated value, 539 
2 3 4 Exercires 

In this set of exercises, as well as others, we give some problems which 
are designed to amplify the concepts already introduced and some which arc 
designed to bring out other concepts defined or explained for the first time 
in the exercise The references at the end of the chapter may be consulted if 
further, more defajJed information reJaiing to these concepts is desired 
In Exercises 2 23 through 2 31. use the values indicated in Fig 2 8 In 
order that the calculations be exact, think of all the variable values in a class 
as falling at the class mark unless otherwise indicated Thus, for example, 
in Fig 2 8e, the ordered values arc 1 1, 1, 3, 3, .7 

2 23 Determine the median mid range and mode for the values in each figure 
2 24 Determine the mean for the values m each figure 
2 25 Compute the variance and standard deviation for the values m Figs 
2 8a b and e 

2,25 Compute the variance and sundard deviation for Figs 2 8d and e 
2,27 Compute the mean deviation for the values in Figs 2 8a, b, and e 
2 28 Compute the mean deviation for the values in Figs 2 8d and e 
2,29 Let 

(2JJ) 

be the definition of the mean deviarion about the mean Compute £« for 
the values in Figs 2 8d and e Compare these values with £„ found in 
Exercise 2 28 

230 Compute the coefficient of variation for the values in each of the figures 

231 Bring the measures of dispersion computed together in one table and 
write a summary comparing these measures with each other and relating 
them to the amount of spread in the figures 

2 32 Assuming the values within each class to be distributed uniformly, that 
IS all values in the class to occur with the same frequency or relative 
frequency, determine what proportion of the observations fall between 
M — <r and fi + <r for Figs 2 8a, b and c 

dnsMer 0 57, 0 51, and 0 60 
Hint Let the base of the ith rectangle in a histogram be c, and the 
area be /, where /, is the frequency of the class with boundaries Xi and 
*1 “ Cf Then the height of the rectangle is /,/c, and the area / 
of that part of the rectangle to the left of the line perpendicular to the 
X axis through any point x, between ** and x, + c, is given by 
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f = - Xi)K (2.12) 

Ci 

The area of that part of the rectangle to the right of Xo is given by 


fi-fo 


or 


(jc'i - Xo)fi 
A 


Thus, /o is the frequency of those values from Xi through .Vo, and /, - /„ 
is the frequency of those values from through x'(. 

2.33. Using the assumption and method of E.xercise 2.32, determine what 

proportion of the observations fall between — Sm snd /i + in 

Figs. 2.8a, b, and c. 

2.34, (a) Show that when a histogram is used the number of observations 

between (i — tr and (t + o- in the fruit fly example (last paragraph of Sect. 
2.3.3.) is actually 533.2. (b) Find the proportion of observations between 
fi — la- and ji + la. Answer. 0.958. 


2.35. (a) Find the arithmetic, geometric, and harmonic means of the numbers 
2 and 8. (b) Find the arithmetic, geometric, and harmonic means of the 
numbers 5, 8, and 25. (c) Compare these three means. 


Note. For positive numbers the harmonic mean is always the smallest, 
and the arithmetic mean is the largest. 

2.36. A student on successive tests answered correctly 8 out of 40 questions, 
36 out of 60 questions, and 18 out of 20 questions. For example, on the 
second test he did three times as well as he did on the first. Find the 
average rate of improvement (geometric mean) between successive tests. 

Answer. G — 3^/Yll = 2.12. 

Note. Suppose we try to use the arithmetic mean (which is ^) as 
the correct average rate to determine the third grade, assuming we do 
not know any grade except the first (which is 20, based on 100 points). 
This would give t^(|')(-|) = It = 101, which is an impossible grade. 
On the other hand, using the geometric mean, we obtain 

which is the correct grade. 

2.37. A car travels 40 miles per hour for 50 miles in one direction and 60 miles 
per hour on the return trip. Find the average (harmonic mean) rate per 
hour for the round trip. 

Note. The arithmetic mean of 50 miles per hour is not appropriate, 
as can be checked by determining the time required to travel the 100 
miles. Observe that the term “average” might refer to either the arithmetic, 
geometric, or harmonic mean and for this reason is not often used in 
statistics. 


2.38. Table 2.13 gives the foul-shooting record of the five regular basketball 
players on a university team: 
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Name 

Siatiberof 
SkM Anenipted 

Percentage of Foul 
Short Made 

Jim 

140 

50 


300 


Harley 

120 

70 

Clay 

too 


Dewitt 

240 



Find the percentage of foul thols made by the team {that is. the five 
regular players) 

Hint In problems of this type the measurements should be weighted 
according to the number of objects having each measurement, the relative 
importance of each measurement, etc Thus, the neighted arithmetic 
mean of a set of values X|, Xt. which have weights w,, w,, 

, respectively, is given by 




Asti'S 


(213) 


It should be observed that the weights need not be integers 
139 (a) The median and mean for the values indicated in Fig 2 8e are 
each 4 Change only one value so that a 7 becomes 21 and then compute 
the median and mean (b) Compare these measures of location with 4, 
noting the effect that only one large (or small) value has on the mean 
and that the mean changes m the dirceuon in which the distribution is 
skewed 


Sole In a symmetrical distribution the arithmetic mean, the median, 
and the mode (antimode, as in Fig 2 8b} are the same, but for skewed 
distributions they differ, (he median falling between the arithmetic mean 
and mode In fact, for moderately skewed ummodal distributions these 
three measures axe approximately related by the formula 


*i-Af=3(/t-»i«) (214) 

according to empirical evidence and a mathematical explanation by 
Doodson [4] 


2 40 (a) Prove that o- = R/Z when W = 2 (b) Let three values ordered from 
highest to lowest be denoted by x„ x„ x. Then 

/? = X, - X, = (x, - x^ + (x, - X,) = k,R + k,R 
where k, - (x, - x,)/(x, - x^ and k^=:^xt - x,)/(x, - x,) Prove that 




(215) 


(c) Show that Formula (2 15) becomes «• = R-/2li when Xj = x, and 
<r = irVT 16 when x, is halfway between x, and x, (d) Find «r for the 
values 1, 2,7 using the definition, uung Formula (2 15) 



SECT. 2.3. 


FREQUENCY DISTRIBUTIONS 


41 


2.41. (a) Prove that 2(^t - jx) = 0. (b) Illustrate that this formula is true for 
values 1, 2, 2, 3, 4, 6, 10. (c) Find 'Z (^i - for values in (b). 

2.42. (a) Verify that < S'n for the values in Exercise 2.41(b). (b) Show that for 

N values Xu , x*- the mean deviation about an arbitrary value 

can never be less than 8^. 

2.43. Is there more variation in the weights of female albino rats or in the 
weights of female humans? Use the following samples to answer this 
question; 

Weights of rats in grams: 205, 190, 180, 230, 215, 195, 210, 170, 215, 190. 
Weights of humans in pounds; 105, 130, 110, 130, 120, 140, 105, 95, 130, 
115, 135, 125. 

Hint. Use the coefficient of variation to compare these two samples, 
realizing that the conclusion might not be the same for the populations. 

2.44. (a) Use the distribution in Exercise 2.39 to compute skewness by the 
following two formulas: 

(2.16) 

o* 

cc, (2.17) 

(b) Changing both values from 7 to 21 in Exercise 2.39, compute SK and 
(Xi and compare with the measures obtained in (a). 

Note. SK is called the Pearsonian coefficient of skewness, since he intro- 
duced it, realizing that /i and M differ in case the distribution is skewed. 
From the definition it is clear that SK is positive for a distribution skewed 
to the right and negative for a distribution skewed to the left, since the 
mean tends to go in the direction in which a distribution is skewed. 
The most important and widely used measure of skewness is cc^ (lower- 
case Greek letter alpha). This is based on moments of the distribution. 

As a distribution becomes more skewed both SK and become 
numerically larger, but not at the same rate. Actually these two measures 
are used more generally to compare asymmetry of two distributions. 
There are several other measures of skewness or asymmetry, but the 
two given here should be sufficient. 


From Fig. 2.8e we see that SK is zero, since M — A = fi. Thus, if 
the measure of skewness (for example, SK) is zero, it does not necessarily 
follow that the distribution is symmetric. 


2.45. 


) 


Karl Pearson in 1906 introduced for the purpose of measuring peakedness 
or kurtosis of a unimodal distribution the moment-ratio given by 


. _ 1 2 fa - 

^ N o-< 


(2.18) 


Using the values from Fig. 2.8a, b, and c, compute and compare in ■ 
view of the note below. 
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Note For the normal distnbutiort. which is a special bell-shaped 
distribution, «, is 3, and tlws is the standard for comparison Distribu- 
tions with a ratio less than, equal to, or greater than 3 are known as 
platykumc, mesokuruc, and leptitkwlie, respectively 


2 4 NUM£R/CAt CAlCUtAr/ONS 


There are several useful devices for shortening calculations involving 
the parameters already introduced We shall consider at this time some of 
the most common procedures along with some illustrative examples (The 
explanations arc for the finite discrete populations Other eases are con- 
sidered in Chap 5 ) 

Theorem 2.1 The \arionce of N \ofues x„ jr*. . jj giyetj hy 

= iii ^ (219) 

or 

(2 23^ 


Proof From the definition of variance given in Fermijia (2 8), we obtain 

Expanding the right-hand side of this equation, using the definition of u, 
and col(ect/ng terms, we obtain the following 

= - 2^ix^ + fi') + (xi- 2ux, +/!*)+ + rxV - 2;jx;y + /i') 

+ X,) + NfP 


= (■*! + ^ + +xV) - 2fi(jr, + ;f, -h 

y i:*. A (^xX 


= 2«1- 


2-0 


' fT^ 
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Formulas (2.19) and (2.20) are obtained from these last two equations, 
respectively, by dividing each by N. 

By a similar argument it can be shown that Formula (2.9) for grouped 
values reduces to 


or 


where 


0 -; = 




N 




— 






( 2 . 21 ) 


( 2 . 21 a) 


Theorem 2.1 is especially useful when a calculating machine is available, 
Formula (2.20) being better tl^an Formula (2.19) for most purposes, since 
fewer recordings and one less division are required. It is also useful whenever 
/X is a repeating decimal. These are illustrated in Example 2.2. 

Example 2.2. Find o-j for the values 2, 3, 5, 

Letting Xj = 2, Xj = 3, and X 3 = 5, we find that 




2 ^( = 10 and 2 = 38 


Hence, by Formula (2.20), 




where = means “approximately equal to.” However, if we find /i first, the 
following procedure is used 


/T = = 3.3 


and 


2 - nY 


N 


(2 - 3.3)^ + (3 - 3.3)^ + (5- 3.31 ^ 


4.67 


1.5566... =i= 1.557 


The second method introduces a rounding-off error immediately, but the 
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roundjng off error need not be introduced in the first method unless wc wish 
to divide as a final step 

Theorem 2 2. Adding a constant (posttne or negalne) to each of N values 
adds the same constant to the mean but does not change the tanance or stand- 
ard deviation 

Proof Let y, = ;f, + /: (i = 1, 2, , hi), where k is any constant Let 

fi, and { 1 , denote the means oftbcAf^'valucsand the values, respectively 
Then 

Ij. + 

yr, = 

_ (JEi + &) + (x, + ^ + + ( + k) 

_ (;[, 4 jc, + xfi -A hik 

"^Xi Nk 2 
““-T7 

Letting <rl and denote the variances for the y values and x values, respec- 
tively, we obtain 


and 


<7* = 


z (y*Tt^y 

~ — 75 — 

2 1(^« + *) - Or* + k)Y 
~ hi 

Z (^< - mO* 


The reader may wish to show that Hicorem 2 2 is true when the values are 
grouped 

ExampI, 2 3 Use Theorems 2 I and 22 to determine a and for the 
values in Table 2 2 

TTievaiuesare ,, = 49,„ = 34.. .n„ = 80. ..x„=108 If 
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we let k = —8.0, it follows that yi = —3.1, = —2.6, . . . , J21 — 0-0» • • • » 

yis = 2.8 

48 

2 >>« = 3.2 

and 

43 

2 >>? = 102.44 

1=1 

Thus, by definition 

^ ^ = 0.067 

and by Formula (2.20) 

, (48)(102.44) - (3.2)= 

O-y - 

= 4^ = 2.13 
2304 


By Theorem 2.2 

/Xx = Mv - ^ = 0.067 - (-8.0) = 8.067 

and 

<j-% = 2.13 

Theorem 2.3. Multiplying each of N values by the same constant also 
multiplies the mean by this constant, the standard deviation by this constant, 
and the variance by this constant squared. 

Proof. Let 

yi = kxi {i=l,2,...,N) 
where A: is a constant. Then, by definition and substitution 


_ _ 


_ 2 (I’i - IhY 
N 


2 (JcXi ~ /Cfix)^ 

N 


_ A:= 2 (.Xi - _ ,, „ 

K •tTx 


and 


a-y = kar^ 
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The proofs for grouped data are similar 

Note In the above proof the limits for the summation were dropped 
That IS, 2 replaced 

i 

From this point on. the limits of summation will be dropped when it does 
not lead to confusion, that is, when if is dear from the context of the problem 
what the limits are 

Theorem 2 4 // }, ~ (Xi ^ a-.VA, nkfre x, is any conslant and A h any 
constant different from 2 ero, then /t, = (jt, — x^/K. a\ = ailK‘, and o-, = 
<rffK 

The proof follows immediately from Theorems 2 2 and 2 3 
Example 2 4 Apply Theorem 2 4 to the data of Table 2 2 
Let ar, be defined as m Example 2 3. let x« ® 8 0 and = tV * 0 I 
Then 


49 — 80 —3 1 -%(* 

3^1 - j'"” — 5*^ 2e, ,>($ — Zs 

23-. = 32 

and 

23l = 10244 

Thus, by definition 


/f, — — 0 67 

and by Formula (2 20) 

(48X10244) - (32)’ 

’ 

From Theorem 2 4, substituting the above values, we have 
AT/i, + X, = Vb{0 67) + 8 0 = 8 067 
<rl ^ XV, = {^».213 = 2 13 
and 

IT, = Xtr, = ,VV2i3=J 46 

Thus, the mean variance, and standard deviation obtained by this method 
are the same as those obtained from the definition As the numbers get larger 
and the frequency increases transformations of this kind become more 
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important as time-saving devices and aids to limiting errors in computation. 

Theorem 2.5. The mean deviation is the sum of the values in the higher 
half-range minus the sum of the values in the lower half-range divided by N; 
that is 


2 higher half of values — 2 lower half 

- 


( 2 . 22 ) 


the median being excluded if N is an odd number. 

Proof. Let the values be arranged in decreasing order of magnitude and 
denoted by 


where .v,,) is the largest value and a-(,yi is the smallest value. Let x,,,, be the 
smallest value equal to or greater than and x,,, be the largest value equal 
to or less than fi^. Then the first half of values .v,,,, jc,,,, . . . , .v,„, represent 
[In or points to the right of /i„ and the second half of values x,,,, . . . , X(,v) 
represent or points to the left of The numerical distance between two 
points on opposite sides of /x„ is the value to the right minus the value to 
the left; it is also the distance between the point to the left and /i,„ plus the 
distance between the point to the right and In particular 

^( 1 ) - = 1x(,, - /i„ 1 -h |x,A-, - /i„l, etc. (2.23) 

Now, from the definition of mean deviation, we may write 

o i ■^(f) pm f 


or, rearranging terms, we have 

Sm = [(|:T(,, - |x,ft., - /r„|) -f (|x,j, - [i„\ + |x,j^._„ — /r„i|) 

+ • • • + (I - Mm I + I x<„ - \)]fN (2.24) 

Substituting Eq. (2.23) in Eq. (2.24) and collecting terms gives 

■+•■•+ x,„,) — (x<y, + -h • • • + Xfti) 

N 

or 


S 


m — 


u 


i^l 


jY 

- 2 ^(t) 

i=[ 

N 


(2.25) 


Example 2.5. Find the mean deviation for the data of Table 2.2. 

Using the notation of Example 2.3 and the definition, we find the median 
to be 
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= 8 45 

From the definitjon of mean deviation we have 

^ 149 - 8451 + 154- 8451+ +|108 - 845| 

6 „ _i ^ 

_ 3 55 + 305 + + 235 

48 

= = 1 20 per cent 

Using Theorem 2 5 and the fact that the sum of the smallest 24 values is 
164 8 and the sum of the largest 24 values is 222 4. we obtain 

. 2224 - 164 8 

35 

= = 1 20percent 

Clearly, computing by the second method has several advantages, one of 
them being that it is not necessary that be found It should be noted that 
the mean deviation is 82 per cent of the standard deviation for this popula 
tion In general, for a bell shaped distnbulion the mean deviation is about 
80 per cent of the standard deviation It can be shown that this percentage is 
79 for the fruit fly distribution 

Any good computational technique will include check devices the best 
techniques having checks at each important step of the calculations A device 
which IS used to check the sum and the sum of squares of a set of values is 
given in Theorem 2 6 

Theorem 2 6. ^ x,, x,, , jt, u any set of N values, then 

2 (jr. + 1)‘ = 2 ^ + 2 2 ^ (2 26) 

Proof We have immediately that 

2 {X, + I)’ = 2 W + 2a. + 1) 

= 2*» + 22'*i + ^ 

The formula for grouped data is 

2 /, (I. + 1)' = 2/.^ + 2 + 2 /. <2 26a) 

where k is the number of groups 
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Example 2.6. Adding one to each value .Vj of Table 2.2, verify that 
Theorem 2.6 holds. 

We find directly that 

2 AT, = 387.2 and 2 = 3225.64 

Adding one to each value x, gives x, + 1 = 5.9, x, + 1 = 6.4, . . . , 
a -48 + 1 = 11 . 8 , and 

2 (Xi + 1)= = (5.9)= f (6.4)= 4 • • • + (1 1.8)= 

^ 4048.04 
But 

2 A-? + 2 2 ^ =3225.64 + 2(387.2) + 48 = 4048.04 

Hence, we have verified that Theorem 2.6 holds, and we have increased our 
confidence that 2 and 2 are correct. 


2.5. APPROXIMATIONS IN CALCULATIONS 

In all calculations we should bear in mind that the usual rules for round- 
ing off numbers are not always adequate. If only one or two mathematical 
operations are required the “rounded-off” answer is usually satisfactory, but 
when long series of operations are performed there is obviously more op- 
portunity to make sizable errors. Thus, we should look more closely at the 
real nature of the “rounding-off” methods. * • 

With this in mind we proceed to define range numbers, that is, approxi- 
mate numbers written in a special form. An approximate value of a number, 
or, more briefly, approximate number, is a number which differs from the true 
value by some restricted small amount. Thus, if x denotes the true value and 
x' an approximate value, then the error, e or A.y, is the absolute value of 
the difference ; that is, e = ] x — x' | . 

If e were known, the true value could be determined by using an ap- 
proximate value, and, in this case, we would not need any rules for rounding 
off. However, e is not known, but is often specified to be less than or 
equal to some small positive quantity, say 77 . Thus, we could say, knowing 
x', that the true value x is in the range from x' — 77 to x' -t- 77 . For example, 
when heights are recorded to the nearest inch, 77 is 0.5 in., and a recording 
of 68 in. is intended to represent any number from 68 — 0.5 = 67.5 to 
68 4 0.5 = 68.5 in. Thus, when we say that 68 in. is an approximate number 
with two significant figures, called the significant figures form, we understand 
that the true value falls in the range of values from 67.5 to 68.5 in. We might 
have indicated this by writing 68 + 0.5 in., or, in a more compact (notation, 
68 (0.5). Dwyer [5] refers to numbers written in this form as approximate-error 
numbers, since 68 is approximate and 0.5 is the maximum error. 
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An approximate number may be expressed m many ways. It is important 
that we understand that, regardless of the way we write it. an approximate 
number represents a range within which the true value of the number is heated 
For some purposes the form 



where r, is the largest possible value and x, is the smallest possible value of 
je in a range, is the best way to write an approximate number An approximate 
number expressed in this form is called a range number, and the two values 
X, and X, are the components of the range number 

In the special case in which the maximum error rj is one half unit in the 
last decimal position, the significant figures, approximate-error, and range 
number forms can all be used to express an approximate number For ex- 
ample, if 6 s: 2 1 IS an approximate number with two significant figures, we 
understand that ij = 0 05 and may write 

. , = = 

However, if is not one-half unit, the significant figures form cannot be used 
Other disadvantages of the usual significant figures form will become ap- 
parent in Examples 2 7 and 2 9 We must first indicate how to operate (add, 
subtract, multiply, etc ) with range numbers 

Let a and b be two approximate numbers written in significant figures 
form Then 



are the range forms of a and b respectively In order to obtain the sum a + b 
of a and b m range form first express a and b in terms of range numbers, 
then the sum is given by 




(2 27) 


where (a -i- b)„ = a„ b„ is the largest value which could be obtained by 
adding a value in the range from a, to to a value in the range from b, to 
b/,, and (a + b), = Oi + bi is the smallest value which could be obtained by 
adding a value in the range from n, to a* to a value m the range from b, to 
b^ It should be noted that Formula (2 27) gives the sum of two approximate 
numbers even though they arc not given in terms of significant figures The 
difference o — b m range form is given by 
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'Oa' 


'bn 


-{a - 

b)n] 


~Oh — bi' 

Ml- 


M- 


.{a - 

b\\ 


. — bh- 


provided we let 



-a,- 

-Qh- 
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(2.28) 


(2.29) 


Other operations are indicated in Example 2.7. 

Example 2.7. Let 2.1, 1.7, 1.3, and 3.2 be approximate numbers with 
two significant figures. Determine range numbers for each of the following 




a. 2.1 X 1.3 

b. 2.1 + 1.7 + 1.3 + 3.2 

c. (1.7 X 3.2) - (2.1 X 1.3) 

d. 3.2 1.3 

e. (1.3)« 

We have, for a, b, c, d, e, respectively 


• "bag obD 


2.1 X 1.3 


2.1 + 1.7 + 1.3 + 3.2 = 


(1.7 X 3.2) - (2.1 X 1.3) 


-2.15- 

.2.05. 

-2.15- 

.2.05. 

-8.50- 

. 8 . 10 . 

1.751 


-1.35- 

- 

.1.25. 

- 

\o 

+ 


2.9025- 

2.5625. 


1.35 

1.25J 




3.251 

3.15 


-3.25- 


-2.151 


-1.35- 

.3.15. 


.2.05j 

X 

.1.25. 


-5.6875- 

-2.9025- 


-3.1250- 

.5.1975. 

.2.5625. 


.2.2950. 


3.2 1.3 = 

-3.25- 

. ri-35‘ 


.3.15. 

■ Li.25. 


-2.60- 



.2.33. 



3.25 n 

1.25 

3.15 

LI.35 


and 
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a3)' = (i 3 “].<[[”])■ 

_ /rl 82251 rl 8225i\* _ r3 321506251* 

“ Ul 56253 ^ 11 562sj/ 12 44140625] 

_ rll 0324037687890625-1 _ f H 04 t 
" L 5 96046447753906253 “I 5 96J 

where 3 15/1 35 =. 2 33 is rounded off so that the lower component becomes 
smaller, and the components of (1 3)* are rounded olT so that they define a 
range of values including the exact components which appear in the next- 
to-the-last step of the calculations 

Example 2 8. Compute the values m Example 2 7 by (he usual rounding- 
off methods 

We have, for a, b. c, d, e, respectively 
21 X i 3 = 2 73 «27 
21 + 17 + 13 + 32 = 83 
(1 7 X 32) - (2 1 X 1 3) = 544 - 273 = 2 71 -27 
32- l3«246I54-25 
(I 3)' = (I 69)‘ = (2 8561)* = 8 15730721 -8 2 
Example 2 9. Compare the results in Examples 2 7 and 2 8 
In order to compare the solutions, we write the answers in Example 2 8, 
expressed in significant figures form as range numbers Thus 


“ L2 65J* 

It IS clear that 


f*”|. 

U25I 


r29025l . 

I25625J 


r8 251 

l2«l' 


275, 
.2 65j 


do not tell the same story The first range number tells us that the product is 
some value in the range from 2 5625 to 2 9025 and the second that the product 
IS some value in the range from 2 65 to 2 75 Clearly the first represents all 
the values, and the second fads to represent many possible values of the 
product Similar and more exaggerated statements can be made concerning 
the comparison of the other values Compare, for example 
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If we attempt to compare the computed values by using significant 
figures, we run into some interesting complication^. For the product 2.1 x 
1.3 we have only one significant figure, namely 3, when using range numbers, 
but the usual rule gives two, namely, 2.7. For the quantity (1.7 x 3.2) — 
(2.1 X 1.3) the usual rule gives 2.7 with two significant figures, but the range 
number method fails to indicate that any single number can be used. Thus, 
it appears that in some cases the rules’ normally used in determining signi- 
ficant figures are altogether misleading. For further discussion of range 
rlumbers and approximate error numbers, see Dwyer [5]. 

The above discussion involving range numbers was used to emphasize 
some of the reasons why the rounding-off rules should be applied with cau- 
tion. We will not at this time get into an exhaustive discussion of the pros 
and cons of the rounding-off methods, but we will point out that the usual 
argument in favor of these methods is that numbers which overestimate and 
numbers which underestimate a set of measurements behave in such a way 
that the errors tend to cancel (balance out) each other in a series of calcula- 
tions. 


2.6. EXERCISES 

The first exercises are intended to lead to a better understanding of the 
numerical methods described in Sects. 2.4 and 2.5; the last refer to all topics 
of this chapter as well as to certain related topics. 

2.46. Compute the mean for the weights of the humans in Exercise 2.43, using 
(a) Theorem 2.2, (b) Theorem 2.3( and (c) Theorem 2.4. 

2.47. Compute the mean deviation for the weights of the rats in Exercise 

2.43, using (a) the definition, (b) Theorem 2.5, and (c) Formula (2.11). 

2.48. Compute the mean deviation for the weights of the humans in Exercise 

2.43, using (a) the definition, (b) Theorem 2.5, and (c) Formula (2.11). 

2.49. Consider weights of the rats in grams in Exercise 2.43, (a) Use Formula 
(2.20) to find the variance of the weights, (b) Subtract 200 grams from 
each weight and use Theorem 2.2 to find the' variance of the weights, 
(c) Let x„ = 200 and K = 5 and use Theorem 2.4 to find the variance 
of the weights, (d) The variance has been obtained by four methods if 
we include the method [definition in Formula (2.8)] used in Exercise 

2.43. Compare the amount of computation required m obtaining these 
answers; the accuracy of the answers. 

2.50. Increase the weight of one rat in Exercise 2.43 by five grams and answer 
all the parts in Exercise 2.49. 

2.51. Compute the variance for the weights of the humans in Exercise 2.43, 
using any method in Sect. 2.4. 
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2S2 Illustrate Theorem 2 6 using (a) the weights of the rats in Exercise 2 43 
and (b) the weigl^ts of the humans in Exercise 2 43 
2, <3 (a) Find the variance <rl for the grouped data m Table 2 3 (b) Find 
— e*/12 where r IS the common class length (c) In Sect 232 
we found that «■* = 2 13 using the data in Table 2 2 Compare cr*, o), 
and <rj 

Noif When parameters are computed from a grouped frequency 
distribution certain errors are introduced as a result of assuming the 
variable values are concentrated at the mid point of the class intervals 
Sheppard [S] Wold po 11) Crag {2] and others have suggested 
corrections for specified parameters for certain types of distributions 
The corrected variance defined in Exercise 2 53(b) and known as 
Sheppard $ correction is an improvement over <r} for bell shaped distn 
buttons with sufficiently smalt class lengths as i$ illustrated in the above 
case However for certain grouped frequency distributions rr’ is not 
necessarily an improvement over rr{ and should be used with caution 
2 54 (a) Find the mean variance and standard deviation for the maximum 
temperatures for the grouped dau in Exercise 2 15(a) (b) Find the mean 
variance and standard deviation for the minimum temperature for 
the grouped data in Exercise 2 1 S(a) (c) Use the mean and standard 
deviation computed m (b) to describe the frequency distribution 

2.55 (a) Find the mean variance and standard deviation for the differences 
(maximum temperature minus minimum temperature) for the grouped 
data tn Exercise 2 16(a) lb) Use the mean and standard deviation found 
in (a) to describe the frequency dtsltibution ©f diffettncts 

2.56 (a) Find the mean variance and standard deviation for the grouped 
data (verbal scores) in Exerase 2 13(a) (b) I>escribe the frequency distri 
button of verbal scores m terms of the mean and standard deviation 

2.57 (a) Find the median and mean deviation Sm for the differences (maximum 
temperature minus minimum temperature) for the grouped data in 
Exercise 2 16(a) (b) Find the median and mean deviation for the 
verbal scores in Exercise 2 13(a) (c) Using ir, found in Exercises 2 55(a) 
and 2 56(a) compute S,/<r, in each case and compare considering the 
note below 

Ne>ie The mean deviation is about 80 per cent of the standard devia 
tion for unimodal curves which are symmetnc or nearly symmetric 
In particular for a "normal" distnbuUon the ratio is -/tJk = 0 798 
2 58 Show that — 0 791 for the fruit fly data in Table 2 5 
2.59 Use range numbers to find the foHowuig 

(a) (12)x(17) (e) 12-17 

(b) (1 2)> (0 2 3 1 2 

<c) 0 2)‘ (g) (I 2)* -a 7) X (0 7)* 

W)>2-12 (h)(213-^ 12) -(17x07) 
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2.60. Prove Formula (2.28). 

2.61, (a) Express each of the following as range numbers 

202(3) ' 2475(50) 

2.73(0.07) 2.222(0.022) 

(b) Express each of the following as approximate-error numbers 


r3.75l 


[42.11 [7.63] 



00 

d 


2.62. Evaluate and express results in range form 


(a) 

(b) 


-2.79' 


'0.56' 

.2.73. 


.0.37. 

■2.79'' 


0.56' 

.2.73. 


.0.37. 


-1.17 

.-1.26 



(c) 


■ 0.27' 
.-0.04. 


-1.17' 

.-1.26. 


2.63. Evaluate and express results in approximate-error form 


(a) 

(b) 

(c) 

(d) 


44.7(0.3) + 22.6(0.8) - 31.2(1.3) 

2.76(0.3) 

0.465(0.095) 

^22.3(1.1) 

6 

2.3(0.2) 


2,64. The heights in inches of 20 men are 


64.0 

72.5 

69.0 

69.5 

68.5 

70.5 

69.5 

■ 69.0 

71.5 

70.0 

70.5 

70.0 

73.5 

71.0 

71.5 

70.5 

69.5 

69.5 

71.0 

72.0 




(a) Find the mean height of these men. (b) Write each height as two signifi- 
cant figures, always increasing numbers ending in 5 by 0.5, and find the 
mean, (c) Write each height as two significant figures, always changing 
numbers ending in 5 to the nearest even integer, and find the mean, 
(d) Compare the results obtained in (b) and (c) with (a). 

Note. We shall always use the method in (c). 

2.65. Prove that SJn is not greater than tr. 

2.66. The frequency of each of the values x = 1,2,... , n is 5. Find the mean 
and variance of this distribution. 

2.67. The frequency of each of the values .x = 2, 4, . . . , 2« is c. Find the mean 
and variance of this distribution. 
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2 68 (a) Prove that the mean of the distribution whose relative frequencies 
at X - 0 1 2 r are 
e “ e “fill' e ,e •p-’lr 

(b) Prove that the variance of the distribution in (a) is also ft 

2 69 The relative frequency of the values x — 0 1 2 n are the successive 
terms m the expansion of (4 + 4)* F*"** ‘he mean and variance of this 
d stnbution 

2 70 The relative frequency of the valucsx^ 0 1 2 « arc the successive 

terms in the expansion of (p + q)” where p + q — X Show that tt-^np 
and o-* — npq 
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The few distributions which are used as models for most of the work 
done in statistics are defined and a few simple properties of each are enu- 
merated. The density and distribution functions are defined and compared. 

3.7. ;ntroduct/on 

We have discussed the terms frequency distribution and relative frequency 
distribution, using tables and graphs to describe the “shape” of a distribu- 
tion. Now we wish to present mathematical functions which “describe” these 
distributions and which serve as models for most collections of data. This is 
what we do, for example, when we use points, lines, and planes in geometry 
for models. We think of these terms as being conceptual idealizations of 
real-life objects, and we use the models because they lend themselves so 
easily to mathematical manipulations. For example, the draftsman in pre- 
paring blue prints for an apartment house uses lines (images of geometrical 
lines) to represent walls, windows, doors, etc. 

In Chap. 2 distributions were discussed in terms of frequencies and 
relative frequencies. Now, as our treatment of distributions becomes more 
formal and more theoretical with the introduction of models and theoretical 
distributions, we use the term probability in place of relative frequency. We 
think of probability as a sort of idealization of such terms as “relative 
frequency,” “proportion,” and “part of,” and introduce formal properties of 
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probability as they are required m the development of theoretical distribu 
tions In Chap 4 we define probabiliiy of an e\ent as the limit of the relative 
frequency with which it occurs and dcmonstrale how such a definition is 
useful in estiroaung probability or in arriving at meaningful assignments of 
probability Even though the axiomatic definition and resulting properties of 
probability introduced in this chapter arc desirable for a formal development 
of theoretical distributions they are of no real value when it comes to assign 
ing probabilities to events or to numerical values of events 

3 2 D/SCRfTE D/STfi/BUnONS OF ONE VARlABiE 

The essential properties of discrete density and distribution functions 
have already been suggested in Sect 222 Thus we need only introduce 
notation in order carefully to describe the general case 

For our purposes we think of probability as a measure associated with 
a real number which has been assigned to an event in an experiment Thus 
probability is a measure associated with an event m an experiment For 
example we may associate the probability measure 0 25 with the occurrence 
of two heads (event) in two tosses of a balanced com (experiment) If the real 
number assigned to the event is the number of heads then we may say, 
for this experiment that the probability of 2 is 0 25 In this chapter we are 
concerned with properties of two important probability functions and not 
with assignments of probability measures to events (or to real numbers of 
events) Thus for our immediate purposes the formal axiomatic approach to 
probability is sufficient 

Assume tint r may take on either a finite number (see Exercises 2 66 
and 2 6?) or a countably infinite number (see Exercise 2 68) of discrete real 
values denoted by x x, ar, where k may be either finite or countably 
infinite Let 


s, - I*, JT, »„) 
be the pth subset of the set 


S=\x X, jr.) (3 1) 

[It IS to be understood that p is cither a positive integer or countably infinite 
that* ^ A: and A: may be countably infinite and that the ;th {j= I, 
2 , k) element m subset p is an clement from 5] Let FfSp] be a non 

negative real number assigned loS, a. subset Spccwttamsttvtstnglt 

number a:„ _ jr, we may write P[x = x,] or/(x,) m place of FIS,] Then 
the discrete variable r is said lohzw a probability densily function f, provided 
that the real number f{x ) assigned to ar, {/ == 1 2 ,k) satisfies the fol 

lowing conditions 
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(a) f{Xi) is a positive real number for each Xi 

(b) i:/(x.) = I ' (3.2) 

I-l I 

(c) P[5J = for each S, 

We also call f a density function or probability function. Assuming that it 
will lead to no confusion in the future, we use the traditional notation /(x) 
to denote density function. [For the discrete case, the reader should note 
that/(x,) denotes “function value” and f(x) denotes “function.”] When/(x) 
is a density function, /(x,) = P[x = x<] gives the probability that x has the 
value .Xu that is, /(x,) is the probability measure of X(. 

We are usually interested in relations such as x < Xj, x > Xj, or 

< ^ < Xu where a and b are any two positive integers such that a <b 
or o is a positive integer and b is countably infinite. In this context, we may 
replace (c) in Eq. (3.2) by the following restri''ted but useful form 


(c') 


P[Xa < X < Xi] = where the real numbers in S defined 

( = a 

in Eq. (3.1) are ordered; that is, 

Xi<X2< • • • < Xfc. 


The reader should note that in (c') it makes no difference whether we think 
of X in the relation Xa < x < x^ as assuming only values in S or as assuming 
all real numbers between Xa and x^ inclusive, it being tacitly assumed that 
/(x) is zero for all values of x not in S. 

Example 3.1. Think of the values of x as being 1, 2, 3, 4, 5, 6 and as 
representing the numbers on the faces of a die. If the die is properly balanced, 
it seems reasonable to let the probability assigned to each value be i.e., 
/(I)=/(2)= =/(6) = -^. Clearly, /(x,) is a density function, for 

Conditions (3.2a) and (3.2b) hold by definition, and Condition (3.2c) follows 
immediately. A graphical description of this density function in the form of 
a line diagram is shown in Fig. 3.1. Each height (length of line segment) 
represents the probability of the corresponding number. As an illustration 


fix) 


2 3 


4 5 




Fig. 3.1 Theoretical Density Function for Die 
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of Condition (3 2c) we note that the probability of the occurrence of an even 
face or of the set {2,4,6j is/(2) +A^) +7t®) = 4 

We think of the distribution function as being an idealization of a cum- 
ulative relative frequency distribution, just as the density function is an 
idealization of a relative frequency distribution In many problems we are 
interested in the probability that the value of a real number is equal to or 
less than some specified value x U is customary to define such a cumulative 
probability for all real values of x from — «» to oo if we let 

X, < X, < < X. 


the cumulative probability function F li defined by the rule 


/•(x) = 


.0, when x<x, 

when jc, ^ X < <r 
whenx^i* 


1 . 2 . 


.*-1) (33) 


This IS also called the disttibution function of the discrete random variable 
X [The reader should, just as wiih/(x) note that F(x) will be used in the 
traditional sense, that is. F(x) may denote ‘the value at x” or “function of 
x" The context in which F(x) is used should make its meaning clear] It 
follows from Definition (3 3) that the distribution function has the following 
two important properties 


F{x) IS a nondecreasing function of the continuous variable x (} 4) 

0 ^ F(x) ^ 1 for each value of x (| 5) 

Example 3 2 The distribution function associated with the density 
function in Example 3 1 is given by 


r 


\ 



when X < 1 
when 1 ^ X < 2 
when 2 < X < 3 
when 3 ^ X < 4 
when 4 ^ X < 5 
when 5 ^ X < 6 
when X ^ 6 


© 


The graph of Eq (3 6) is shown in Fig 3 2 It should be observed that F(x) 
IS a nondecreasing function of x with a finite number of jumps (or steps or 
saltuses) and that the ith jump i$/(x,) 
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Fig. 3.2 Theoretical Distribution Function for Die 

The mean (i and variance <r- of the discrete variable x with density func- 
tion /(x) are given by 


At = 2 


and 


«■' = 2 - IJ'fAXi) 


(3.7) 


(3.8) 


respectively. In case N is finite it is easy to show that Formulas (2.2) and 

(2.9) are special cases of Eqs. (3.7) and (3.8). If we assume each of the 
observations to have probability l/N, it follows that Xj has probability 

fiXi) =filN, since 'Z fi = N. Thus 


r k f 'Zi 


and 


t «. /■ 2 (^f - At)Vi 

o-' = 2 ixi - = 2 (x, - Ilf A =z i^i — 

1=1 yv Is 

Theorem 3.1. Iff{x) is the density function of the discrete variable Xj (A = 1, 
2, k), then 


2 — O'" -t- /i-“ 


t=i 


(3.9) 


Proof. From Eq. (3.8) we have 


= 2 (4 - 2iix, -1- /x^)/(x.) 

= 2 ^ViXi) - 2/x 2 xJixt) + fi- 2/(^i) 

= 2 xlfiXi) - 2/i./x + /i=.l by Eqs. (3.7) and (3.2) 
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Example 3.3 Find the mean and variance of the discrete variable with 
density function given iiv Example 3 1 
By Definition (3 7) 

;i=l.i+2*A+- +6*‘=Ysi=35 


and, by Definition (3 8) 

<r‘ *= (1 - 3 5)’-^ -I 
17 50 8 75 


+ {S-3 5)'4 


AUo, since 2 ^/(^.) = + 2**‘ + + 6'-i » V> ^ 

Theorem 3 I and Definition (3 7) 

<r» = ^ _(t)i a i|*2 92 

Example 3.4. Find the probability of values 2. 3, 4, and 5 in Example 3 I 
From Eq (3 2c') we obtain 

F12 £ X ^ 5] = f(l) + /(3) + m + /(5) 

= i-t-i + i + i 

Note Since F(x,) » F’(t,) ss/(xi), we have 

F[2 :£ -V £ 5] = ± f{x,) ^ 2/(*.) = f(^.) " 

~ V “ ? ~ 7 

It can be shown that for any two positive integers a and b such that 
a £ i that 

F{x,) - Fix,.,) = g/i:x,) (3 10) 

The SIX values resulting from the toss of a well balanced die have been 

used to illustrate .the. twtarnqfjif. jliAtiibittuir .t\nr* 

tions At this time we present two discrete distributions which are of great 
importance in practice as well as theory They are also used frequently in 
the later chapters of this book 
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3.2.1. Dichotomous Distribution 

The simplest of all distributions and one which occurs so often in practice 
results from our tendency to place each observation from a population in 
one or the other of two categories (or classes or groups), mutually exclusive 
and together exhaustive. For example, we divide a population of some manu- 
factured product into defective and nondefective items, a population of 
students into male and female, a population of responses to a question into 
yes and no, and a population of diseased people into those recovering and 
those who do not. Generally, the variable is one we would consider to be 
qualitative, but since only two categories are involved, it is easy to think 
of the variable as being quantitative. It is customary to call an observation 
a “success” or a “failure” depending on which category it falls into, and for 
mathematical reasons to denote failure by 0 and success by 1. It does not 
make any difference, so far as the mathematics is concerned, whether we 
denote success by 1 or by 0. We use 1, following convention. 

Letting p denote the probability with which 1 (success) occurs and 
q — 1 — p the probability with which 0 (failure) occurs, we may write the 
density function for a dichotomous variable d(x; p) as 


/(x) = d{x-,p) = 



when X = 0 
when X = 1 


(3.11) 


Once p is known, d(x;p) is uniquely determined.- We say that d(x; p) rep- 
resents a one-parameter family of distributions, since only one parameter is 
required in order to determine the distributions. For p — 0.3, say, we have 
the particular dichotomous distribution with density function J 

0.3) = 

(0.3, when x = 1 

The distribution function is 


0 , 


F{x) = 


9 - 


when X < 0 
when 0 < X < 1 


vl, when x> 1 


(3.12) 


If we use Eqs.(3.7) and (3.8), it follows that the mean and variance of a 
dichotomous variable with density function defined by Eq.(3.11) are 


/X = 0-q + l>p = p 


(3.13) 


^4 THEORETICAL DBTTUBtmONS CHAP 3 

and 

tr' ={0 -p)''P 

= pq{p + q) 
or 

= pq (3 14) 

The student should be cautioned concerning the dual use of the symbol 
p By definition p denotes the probability of x = I On the other hand, p 
also IS that value of the variable jr which is the mean Figure 3 3 should 
make the distinction clear 



Fig Dichotomous Disiribuiion' 

The dichotomous population just considered is sometimes called a 
binomial population We refrain from using the term binomial distribution at 
this time, because it might lead to confusion when we discuss sampling 
distributions determined from dichotomous distributions, as discussed m 
Sect 5 3 

3 22 Poiison Oirfnbution 

We have already considered in some detail the nature of two discrete 
distributions with finitely many possible values of the variable x Now we 
describe a discrete distribution called the Poisson distribution with 0, t, 2, 
3 being the possible values of v 

The Poisson density function p(x, m) is given by 

= = ic = 0.1,2.3, (315) 

where m is a positive real constant and e is the base of the natural logarithms, 
or 


It should be recalled that 
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It should be observed that the variable x in Eq.(3.15) needs no subscript, 
since its possible values are the nonnegative integers. Tjie Poisson distr ibu- 
tion is uniquely de termined by the jingle pa ramete r m. Thus, we sometimes 
say that the i^isson is a one-parameter family_of^%ri^tiojis and use the 
notation /(jc; m) or p(x', m) to denote this. The notation p(x', 1.0) denotes 
the Poisson density function with m = 1.0 and represents one member of 
the family. 

For any positive real number m and any nonnegative integer x, we see 
that e"”* m^/xl is always positive. Further 


2/W = 2 




x\ 


■“ yl 


= e-” ^ 


1 + ^4 -^ + 
1! 2! 


■) = 


The three conditions in Eq.(3.2) hold, if we assume that the probability of 
any set Sp is given by Condition (3.2c). Hence, we were justified in calling 
Eq. (3.15) a density function. 

It can be shown (see Exercise 2.68) that both the mean and variance of 
a discrete variable with density function given by Eq. (3.15) are equal to m, 
i.e. 

p = = m for a Poisson distribution (3.16) 

Solution of many problems found in practice are, as we shall see later, much 
easier to obtain due to this rare property. 

In any applied problem we would not expect x to be infinitely large. 
Thus, the Poisson distribution can serve only as a model and an approxima- 
tion to real-life situations. However, the limitation is not as restrictive as it 
might seem at first. 

In order to get some idea of how little we are restricted, since, in fact, 
31 may get infinitely large, let us consider Table 3.1, which lists p{x; m) correct 
to four decimal places for selected values of tn. From this table it is clear 
that the cumulated probabilities from some small value of x upwards is zero 
if only four significant figures are required. Thus, for example 

2 p(^; 1.0) = 0.0000 ^ 

1=8 (X e 

since 

;i:p(x;1.0)= 1.0000 
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sides, and blue on the remaining sides, (a) Define density and distribution 
functions which describe the distribution of colors, (b) Give an illustra- 
tion in which the density function of (a) serves as a model. 

3.2. (a) Find the density and distribution functions of the sum of two numbers 
(dots on two dice) which appear on two balanced dice, (b) Find the 
mean and variance of the sums, (c) Graph the density and distribution 
functions, indicating the location of the mean on the density graph. 

3.3. (a) The discrete uniform distribution has the density function f{x) — \lm, 

where x — 1, . . . , m. Find the distribution function as well as the 

mean and variance of the variable x. (b) Give an illustration of a particular 
uniform distribution. 

3.4. (a) Given /(x) = c{x - Tf, where x I, 2, . . . , 6, find c so that fix) 
is a density function, (b) Find the distribution function and graph it. 
(c) Find the mean and variance of this variable x. 

3.5. Given fix) — c(x - 2), where a = 1, 2, . . . , 6, is it possible to find a 
constant c so that fix) is a density function? Why? 

3.6. If 1000 observations closely approximate a Poisson distribution with 
mean 2.5, how many of them would have x values of 2, 4, 6, 8, 10? 

3.7. The variance of a population known to have the Poisson distribution 
is 2.0. (a) Determine the distribution function of this population, (b) 
Use Table 3.1 to find what proportion of the x values are less than 3; 
between 2 and 5; between 2 and 5 inclusive. 

3.8. (a) Find c so that fix) = cjxl, where .r = 0, 1, 2, . . . is a density 
function, (b) Find P[x < 2]; P[x < 2] for the density function in (a). 

3.9. Assume that the number of telephone calls x coming to a certain switch- 
board during a period of one minute is approximately distributed as 
a Poisson variable with mean 5. (a) For what proportion of one minute 
intervals will there be more than ten calls coming to the switchboard 
per one minute interval? (b) If the switchboard can handle, at most, 
12 calls per minute, what proportion of one minute intervals will the 
switchboard be overtaxed? 

3.10. The number x of white blood corpuscles on slides of a fixed size is assumed 
to be distributed as a Poisson variable with mean 4. If more than ten 
indicates a dangerous surplus of white corpuscles, what proportion of 
slides fall in this category? 

3.11. Let 

fix) = (i)" 

where a: = 1, 2, . . . . (a) Is fix) a density function? (b) If fix) is a density 
function, find the mean of this distribution. (Strictly speaking, we should 
say “mean of the variable x," but this is a common expression which 
we sometimes use.) 

Hint. Differentiate both sides of the identity 1 o -f a- 

1/(1 - a) with respect to o; let a = etc. 
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(c) This function serves as a mode! for a balanced com which is tossed 
until a head appears for the first time on the arth toss Give another 
illustration tn which /(jr) = (J)» serves as a model 

3 3 CONTINUOUS DISTRIflUTlONS Of ONE VARIABLE 

The regular forms of the histograms and cumulative polygons discussed 
m Chap 2 suggest that m favorable cases data are approximations to dis- 
tributions which can be represented by smooth curves and be gtven simple 
mathematical expressions Now we consider how this can be accomplished 

Think of the case where the area of the histogram is made equal to one 
FrotTi the discussion m Chap 2 it is clear that the sum of the areas of neigh- 
boring rectangles is equal to the relative frequency with which (proportion of 
times) the value of r falls tn the intervals which make up the bases of those 
rectangles Now suppose we think of subdividing each interval into two, 
then four, then eight, etc smaller intervals (For a finite collection of obser- 
vations the histograms would look smoother for a while as the number of 
rectangles increased, but eventually the resulting histograms would become 
more and more irregular ) Conceptually, tn an infinite population we would 
rarely have irregular relative frequencies as the number of subdivisions 
increases Since the property of smoothness would continue to hold as the 
number of subdivisions increases indefinitely, the area under the limiting or 
idealized curve between any two given values of x should be equal to the 
relative Irequencj with which x would he m the interval determined by (hose 
two values of x 

The function y(x) whose graph is the limiting curve of the series of 
histograms just described is considered the mathematical model and is called 
the density function of the variable x Formally, we say the continuous vari- 
able X has a probability density function f(x) if it satisfies the following 
properties 

(2) /(-f) IS e single-valued nonnegative real number for 
all real values of x 

/y(x)a'x=I (3 17) 

dx = Fla < X < A) 

where F{a < x < b) denotes the probability with which x falls be- 
luiMt Sit}- c«a nsi vaAies a anrf d for wAicfi a <6 
We also call/(x) or/a density function ot probability function of the continuous 
variable x 

The curve m Fig 3 4 represents a smooth density function which goes 
from c to -i- 00 Since it is a density function, the three proper*ies of Eq 
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G.17) must be satisfied. For example, the geometric representation of prop- 
erty (c) is the, area under the curve and above the x axis which is between 
the two vertical lines x = a and x b. This is a' visual image of the proba- 
bility with which x falls in the interval between a and b. 



Fig. 3.4 Theoretical Density Function for Continuous Variable x 

Comparing the definitions of discrete and continuous density functions 
as given by Eqs. (3.2) and (3.17), respectively, we see that they satisfy the same 
kinds of properties. However, there is one obvious difference which may 
prove to be annoying if it is not discussed. It is the way in which we think 
geometrically of probabilities. In the discrete case the probability of a set 
of (discrete) points in an interval is the sum of lengths of line segments above 
the points; m the continuous case the probability of the set of (all) points 
in an interval is an area above the interval. In one case, we see probability 
measured in terms of lineal units (lengths) and in the other case it is measured 
in terms of square units (area). 

Since the area over a point is zero this leads us, in the continuous case, 
to define the probability of a single value Xo to be zero. In case there are those 
who think this is unreasonable, consider the following argument: let d be 
a small positive number. Then the probability of the set of points in the 
interval from — cf to Xo + </ is given by 

J nx^+d 

f{x)dx 

It is possible to find a value of x, say x„ in the interval Xo — dio Xo + d such 
that the area of the rectangle of height /(x,) above the interval is = ldf{x^). 
Now for a given x„/(xO is a fixed value, since /(x) is assumed to be a single- 
valued continuous function. Thus 

lim 2d /(x,) = 0 

d -*0 

and so the probability of the reduced interval, a single point Xo, is zero. 
Hence, in the continuous case it turns out that the density function value 
/(Xo) is not the probability for x„, whereas in the discrete case the value /(x„) 
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IS the probability for x. For these reasons »c are Jed to the use of/t*) dx 
called the probabiliiy elemtnt or density tiement when we are discussing 
probabilities in the continuous case Here dx is the usual differential element 
defined in the calculus 

The interval a <x <b was used m Definition {3 17) when we gave the 
third property of a density function In view of the above presentation it is 
clear that we could have used a ^ x ^ or a< x<fcora<x<fc without 
changing the probability associated with the intervals Thus if wc had used 
the interval a ^ x ^ 6 m Eq (3 17) property (c) m Definitions (3 2) and 
(3 17) of a density function would have appeared the same 

The distribution function Ffx) of the continuous variable x which is the 
idealization of the cumulative polygon is defined by 

nc) /'/(»)</« [or /'/(Orfl] (318) 

It should be noted that the vanable of integraiion is a dummy variable 
and that the distribution /iinciion is a /unction of she upper hmit of the 
integral expression It follows from Definition (3 ]S) shat 

(/■(*) IS a nondecfcasing function of x and 

ioS/-(r)SI 

From Eq(3 18) it is clear how the distribution function can be obtained 
from the density function Conversely fl_x) can be found from f(x) by 
difi’erenSiation that is 

M = lor /(4Jr = rff(x)) 

The mean n and variance o’ of a continuous variable x wish density 
function/(x) arc defined by 

ti = f'lxAx)dx (3 20) 

and 

(3 21 ) 

Theorem 3 2// f(x) u the density function of the continuous variable 
X then 

/"x’/i(x)<Zx = ff»+ (3 22) 

Proof Starting with Eq (3 21) and using Eqs (3 20) and (3 17) we have 
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o-= = f” (a:' - 2/tx +• dx 

*7 — 00 

= X” - 2m dx + dx 


or 


Solving this last equation for 

J2^x^f(x)dx 


gives Eq. (3.22). 

Theorem 3.3. If x has distribution function F(x), then the probability of x 
falling between any two points x = a and x = b{a <b) is given by F(b) — F(a). 

Proof. The result follows immediately from Definition (3.18). Thus 
Fib) - Fia) = f_J{x) dx - £j{x) dx 

= r fix) dx + ffix) dx - r fix) dx 

*/ — eo */a */— eo 

= ffix) dx 

*Ja 


Even though there are real and important differences between discrete 
and continuous distributions, it is clear that the forms used in defining density 
and distribution functions as well as means and variances are very similar. 
Thus, in much of the theoretical presentation in the remaining chapters we 
shall treat only the continuous case in detail and leave the discrete case for 
the reader. Generally, this requires only changing f dx to 2 and going 
through the same steps. 

To avoid stating all definitions and theorems twice, some authors in 
mathematical studies use a type of integral due to Stieltjes. The Stieltjes 
integral in one summation process includes the summation denoted by 2 
and the usual Riemann integral (infinite summation) denoted by We shall 
not use this type of integral, since it is beyond the scope of this book. 

In order to illustrate the definitions of this section we shall now examine 
two very important continuous distributions. We consider the simplest 
distribution first. 


5^.1. Uniform Distribution 
/ y 

^If^the density function of a variable .x is constant over some region 
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(domain) and zero elsewhere, the vanablc x is said to be uniformly distributed 
over that region Thtis, a continuous variable wjih density function 


/(x)= u(x,f,d)== 



for e^x^d 
otherwise 


(323) 


IS uniformly distributed over the interval from e to d This is also referred 
to as a rectangular distribution The parameters are e and d Thus, the uni- 
form distribution belongs to a two parameter family of distributions The 
graph of a typical uniform distribution is shown in Fig 3 5 



e 0 b a 

Fig. 3.5 K TypKsi Uniform Density Function 


The distribution function is given by 

0. when x < c 

?(,)= whentS»Si< (3 24) 

1, when X > d 

and the typical graph is shown m Fig 3 6 



Fig 3 < A Typical Uniform Dulribution Function 

According to Theorem 3 3 the shaded area in Fig. 3 5 is numerically the 
same as the length of the line segment AB in Fig 3 6 That is, the numerical 
value of the length of AB is also the probability of x falling m the interval 
from a to b Using Eq (3 10), we can find a similar line segment m the 
discrete case which measures the probability of discrete values falling m the 
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closed interval from c to d. (By closed interval we mean that the end points 
c and d are included in the interval.) The close similarity between discrete 
and continuous distribution functions, as indicated by this property, is one 
of the reasons why F(x) is at the heart of mathematical studies in statistics. 

Since the density function is symmetrical about x — {c + ^/)/2, the center 
of the interval from c to d, we would expect the mean to be this value. Using 
Eq. (3.20), we find that 

II = J_”^xf(x)dx 

= £_x.0dx+£x (jX ) & + _(■ x.O & 

_ d^ -c^ 

2{d - c) 

or 

H = ^ (3.25) 

which verifies our expectation. It can be shown by using Theorem 3.2 and 
Formula (3.25) that the variance is 


^ (3.26) 

This distribution is of great importance both practically and theoreti- 
cally, In applied work it is useful in studying rounding errors in measure- 
ments which are made within a specified accuracy. For example, weights of 
humans are usually made and recorded to the nearest pound. It is assumed 
that the difference in the recorded and true weight is some number between 
—0.5 and 0.5 and that the error is uniformly distributed over this interval. 
For this study the density function would be f{x) = 1 when — 0.5 < x < 0.5 
and/(x) = 0 otherwise. The mean would be 0 and the variance 

The uniform distribution is of greater importance theoretically due to 
the following very important theorem, which we state without proof. 

Theorem 3.4. Let f{x) be any density function of a continuous variable x 
and F{x) be its distribution function. Then f{x) may be transformed to the 
uniform density function 


g(u) = 1 0 < M < 1 

by letting u = F(x). 

It is clear that u must range from 0 to 1, since this is the range of F(x). 
It is possible with the use of this theorem to exhibit many properties of 
continuous distributions in general by proving them for this particular uni- 
form distribution. It follows from this theorem that there is at least one 
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transformation which transforms any continuous distribution into any other 
continuous distribution One transformation obtained is by combining the 
transformation which transforms the first distribution into the uniform 
distribution with the inverse (reverse) of the transformation which transforms 
the second distribution into the uniform distribution 

3 3 2 Normal Ditfribohen 

The most important distribution in statistics is the normal distribution 
The normal density function n{x, n, a) is given by 

/(x) = «r) = -oo<x<eo (3 27) 

where n and <t are parameters which also happen to be the mean and standard 
deviation of the continuous variable t which is normally distributed The 
normal density function is also called the Gouman function and the error 
function 

The graph of a typical normal density is given m Fig 3 7 It is clear 
that the curve is symmeicical about the line x ti and hence by symmetry 
the mean must be fi The function is defined for all real values of x and has 
points of inflection at x « /t ± 



fi-Za /i-ir ft fi+Zir 


Fi* J 7 Typ»c*l Normal Density Curve 

Sometimes we refer to ft as the location pararneler and tr as the scale 
parameter of this two parameter family of distributions For a fixed a, if we 
change ft the resulting curves keep the same shape but have different locations 
along the v-axis However if ft is held fixed and a- is allowed to vary, the 
curves have different spreads Figure 3 8 shows three curves with /i = 5 
and tr = 1, 2, and 3, respectively The area under each curve is one. but 
there arc proportionately more values of x further away from the mean when 
<r gets larger and larger 

Since we defined rifx fi, ir)tobea density function, it must be true that 
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n{x‘, fi, <T)dx — I 


(3.28) 


This fact is not obvious. Since «(x) is such an important function, we shall 
take time to verify Eq. (3.28) and to become familiar with some of the trou- 
blesome problems involved in manipulating n{x). Let the area under the curve 
of fi(x), as shown in Fig. 3.7, be A, Then we have 


A = 




dx 


Letting w = (x — /i)/cr, we obtain dx — a- du and 


1 

A=-^f e-^'^^du 

W /.Tt J-oo 

We could evaluate this integral by using a book of tables of integrals. How- 
ever, we use the following method to determine A ; 

Since u is a dummy variable, we may write 


Thus 


or 


= vb vs /->■ 


T3»/2 


dv 


fc/ — OO */— CO 


Letting u - r sin 6/ and v = r cos 0, we find that du dv = rd 8 dr and hence 
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= £" re-KJr 

= (-«-T.-=l 

Since n(x) is always nonnegativc, A is positive and hence A = 1 
The normal dislribultott fttncllon N(x; ft, o) Is 

f (a) = «{a, fl, o) = f'_ < dy (3 29) 

The integral in Eq (3 29) can not be expressed so as to have a simple func- 
tional form, but it can be computed by numerical methods Since Eq (3 29) 
changes whenever either 4 or a- changes, we consider a particular member of 
this two-parameter family of distributions Letting fi ~ 0 and a- = 1 and 
replacing x by ( in Eq (3 27), we obtain 

n(f,0, 1) S5 (3 30) 

which 18 called the standard normal density function Hence, the standard 
normal distribution function is given by 

N(l, 0, 1) = ^ £_ dn (3 Jl) 

Extensive tables of n(f) and N{i) (to 15 decimal places) have been compiled 
by the New York Mathematical Tables Project [II] and others [4, 12} For 
our purposes the abbreviated Tables I and II m the back of this book are 
adequate Graphs of the standard normal density and distribution functions 
are given in Figs 3 9 and 3 10, respectively Note that the shaded area m 
Fig 3 9 is numcncally the same as the ordinate N{f,) m Fig 3 10 This shows 
the type of values given in Table II 

Before we consider uses of these tables, it should be observed that they 
may be used to find ordinates and areas for any member of the family of 
normal distributions For if we let 


0 ( 0 ] 



Fig 3S Standard Normal Density Curve 
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Fig. 3.10 Standard Normal Distribution Curve 


t = (3.32) 

cr 

in the probability element n(x; ji, <r) dx, we obtain 

= dt = n{f,^i, 1) dt 

where 

is the same function as the one given in Eq.(3.30). From this point on, when 
n(r) and N(j) are used, it will be understood that the standard normal dis- 
tribution is being referred to. 

Due to the symmetry of the normal distribution, only nonnegative values 
of t are given in Tables I and 11. Thus, to facilitate the use of Table I we need 
the relation 


= fl(0 

and to facilitate the use of Table 11 we need the relations 

(3.33) 

J' n(u)du = N(t) — 0.5 

(3.34) 

J ^n(u)du = 2N{t) — 1 

(3.35) 

Ni~t)= 1 - Nit) 

(3.36) 


The proofs of these relationships will be left to the student. 


3.3.3. Illustrations for the Normal Distribution 

Three intervals of importance in studies of bell-shaped distributions were 
given in Sect. 2.3.3. Using Table II, we can now determine the proportion 
of r values falling in any interval of a normally distributed variable. 

Example 3.6. For the standard normal distribution find the probability 
of values falling within one standard deviation of the mean; that is, find the 
proportion of standard normal values falling between / = — 1 and r = 1. 
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Also find the proportion falling bMween / = -2 and / = 2, / = -3 and 
r = 3 

From Table JI. using Eq (3 35) we obtain 
jV(f 1) - ^(r s= - 1) = 2(0 8413) ~ I = 0 6826 == 68 26 per cent 

JV(f = 2) - A'(r= -2)t=2(OW72)- J ==0 9544 = 95 44 per cent 

A'(f = 3) - A'(/= -3) = 2(09987)- J =09974 = 99 74 per cent 

Example 3 7. What symmetric interval about the mean of a standard 

normal distribution contains (a) 90 per cent, (b) 95 per cent, (c) 99 per cent 
of the r values’ 

For (a) we must find t such that 2W(r) - 1 =0 9000 or W(/) = 0 9500 
Using linear interpolation lo Table II, we get t = 1 64S Thus, the interval 
IS from — 1 645 to I 645 Similarly, we find the 95 per cent interval to reach 
from - 1 960 to 1 960 and the 99 per cent interval to reach from —2 575 to 
2 575 (The three values 1 645, 1 96 and 2 575 arc used so ofien that they 
should be memonaed ) 

In general, if r« denotes that value of the standard normal distribution 
for which 

J n(t,0, l)dl o a 

where tx is a positive number less than I we say the 100(1 2a) per cent 

symmetric interval of x about n teaches from >i - to /s + 1,9 That is 
/i ± are the limits of a 100(1 - 2a) percent symmeincmteml about 
Example 3.8 For the normal distribution with mean 5 and variance 4, 
find what proportion of the jr values fall between x, » l and x, » 7 
The area under the normal density curve n(x 5 2) from I to 7 is the same 
as the area under the standard normal density curve from 

,, = « -2 

to 

1 , = = 1 

Thus, using Table It and Eq (3 36), we obtain 

W(x = 7. 5, 2) - = 1, 5, 2) = hffr = 1, 0, 1> - J/p == -2^ fV J> 

= N{t = 1) - [1 _ //(, r. 2)) 

= 08413 ~ (I -0 9772) 

= 08185 



SECT. 3.3. 


THEORETICAL DISTRIBUTIONS 


79 


The relation between the x-scale and f-scale is brought out in Fig. 3.11, 
where the appropriate scales are placed under a normal curve. From Table 
I we find that «(r = 0) = 0.3989=^0.40, «(/=!)=: 0.2420=^=0.24, and 
n{t = 2) = 0.0540 = 0.05. Hence, n(x = 5) = «(/ = 0)/2 =^ 0.20, «(x = 7) 
=i= 0.12, and n{x = 9) 0.03. (The curve in Fig. 3.1 1 is not a “true picture” 

of either density function, since the lengths of the units of measurements on 
the horizontal and vertical axes are dififerent. It should be understood that 
the two vertical scales differ also.) 



Fig. 3.11 Comparison of Scales for Normal Density Curve 


Example 3.9. The teachers in a certain department of a large American 
university assign grades to the beginning class by means of the normal dis- 
tribution. Determine how many of 500 students will receive each of the 
grades A, B, C, D, and F if F is given to a student whose grade falls in the 
interval ( — oo, /i— 1.5cr), D is given if the grade falls in the interval 
(/i — 1.5o-, IX — 0.5tr), etc. 

The 1 values which determine the intervals are —1.5, —0.5, 0.5, and 1.5. 
From Table II we find that the areas above these five intervals are 0.0668, 
0.2417, 0.3830, 0.2417, and 0.0668, respectively. Then the number of students 
who will be assigned each grade is as follows; 500(0.0668) = 33.4 33 will 

receive the grade F, 121 the grade D, 192 the grade C, 121 the grade B, and 
33 the grade A. 

Actually, once the percentages 7, 24, 38, 24, 7 have been determined, the 
department would give the grade of A to the top 7 per cent, the grade of B 
to the next 24 per cent, etc. It should be observed that, even though the ex- 
treme theoretical intervals go to infinity, the practical limits are /z — 2.5o- 
and IX + 2.5(7 in this case. Thus all the practical intervals have length cr 
in this case. 

Theoretical distributions were introduced in an effort to find simple 
functions which closely approximate real data and which serve as models for 
collections of measurements. The next example is given to illustrate how the 
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model density curve can be fitted to the histogram I-atcr this example will 
be used to determine if the fit is satisfactory 

Example 3 10. Fit a normal density curve to the teak tree data of Exer- 
cise 2 to, assuming that we have reason to believe that this is a representative 
sample from a normai distnbatfon mlh mean {t and variance {r\ 

This fitted curve serves as an approximation to the population curve under 
the assumption that the sample size is large enough so that the sample mean 
fi = 21 69 and sample variances* *= 34 5156 arc close to the population mean 
and variance Table 3 2 illustrates how the normal curve with mean ft and 
variance v’ is fitted to the data (The arcumflex symbol * above the symbol 
for a parameter denotes estimate of the parameter ) Columns 5 and 6 of 
Table 3 2 may be used to compare the area under the fitted curve and the 
histogram for the various class intervals In this problem there appears to 
be good agreement for all intervals except possibly for the class interval 
22 5-25 5 


Tabic id. 

Fitioig of Normal Curve Jo TeaX Tree Hsiopam 


Upper 

Oau 

Boundaries 

X 

*-2t69 

“Tifr 

i4rta (0 
Left o! t 

A 

AetaoHT 

Interval to 

Ufloft 

aA 

Vieoreiieal 

Frequency 

1088 a.4 

Observed lenetfi of 

Frequency Ordinate at 

Class Mark 

/ 1088 Kx) 

75 

-242 

0078 

0078 

85 

8 

21 

10 5 

-190 

0287 

0209 

22 7 

26 

72 

15 5 

-139 

0823 

0536 

58 3 

SO 

189 

16 5 

-0 88 

1894 

1071 

1165 

120 

38 6 

19 5 

-037 

3557 

1663 

1809 

181 

606 

22.5 

014 

5537 

2000 

2176 

215 

73 4 


0 63 

7422 

- 1863 

2029 

213 

68 5 

2SS 

1 16 

sm 

iSd% 

f46 7 

MS 

49 J 


1 67 

9525 

0753 

821 

76 

27 3 


218 

9854 

0329 

35 8 

36 

It 5 


269 

9964 

0110 

120 

18 

38 


It IS not generally necessary actually to graph the fitted normal curve 
when It IS being compared with the frequency histogram However, if for 
any reason a graph seems desirable, we multiply the density function n(x, 
21 69, 5 S75) by the sample size 1088 so that the area under the resulting curve 
is the same as that of the histogram Using Table J. we obtain fbe ordinates 
of the fitted curve shown in Table 3 2, where the ordinate of the class mark 
IS given by I088i,(x) = 108HO/5 875 = 185 2n(f) Figure 3 12 pictures the 
goodness of fit In Sect 15 2 we present a statistical procedure designed to 
“tes' the goodness of fit ” 

There is a special kind of graph paper, called normal probabiliiy (or 
cumulative normal) graph paper, which may be used to determine whether a 
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set of observations my have been drawn from a normal distribution. This 
graph paper is prepared by transforming the vertical scale (as is indicated 
in Example 3.11) so that a cumulative normal distribution curve such as 
the one shown in Fig. 3.10 becomes a straight line. Since observations are so 
often assumed to be drawn from a normal distribution, this graph paper 
is very useful in practice. Figure 3.13 shows normal probability graph paper 
as it is used in Example 3.11. The reader should note that Theorem 3.4 can 
be used to transform any continuous distribution function so that the graph 
of the transformed distribution function appears as a straight line. 

Example 3.11. Plot the teak tree data of Exercise 2.10 on normal proba- 
bility graph paper. 

First, we find the cumulative relative frequencies below the class bound- 


Table 3.3 

Cumulative Relative Frequencies for the Teak Tree Data 


Upper Class 
Boundaries 

Cumulative 

Frequencies 

Cum. Relative 
Frequencies 

7.5 

8 


10.5 

34 


13.5 

84 

0.077 

16.5 

204 

0.188 

19.5 

385 

0.354 

22.5 

600 

0.551 

25.5 

813 

0.747 

28.5 

958 

0.881 

31.5 

1034 

0.950 

34.5 

1070 

0.983 

37.5 

1088 

1.000 
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ariM shown m Tatfe ^ 1 T?»e««ep&cc?i lOS ,3? 5 stcqual intervsh 
on the homontal 5cale of norma! probaMity graph paper, usmg as much of 
the scale as is convenient Along the nght hand vertical scale, which is already 
marked we locate the cumulative relative frequencies and plot points with 
class boundaries as abscissas and cumulative relative frequencies as ordinates 
as indicated ift Fig J 13 Usrng & strarghl edge we draw the best fitting line 
by sight In ibis case, the j«)ints either fall on or very near the line Hence 
we conclude that the observations could have come from a normal popula 
tion This IS the method commonly used to determine whether the frequent 
assumption that the observations are drawn from a normal population is 
tenable Note that the left band vertical scale in Fig 3 13 is used when we 
obtain cumulative relative frequencies above class boundaries 



Furthermore when the fit is good wc can estimate the mean and variance 
of the normal distribution The 50 point on the vertical scale corresponds 
^ the estimated mean fi on the borizontaJ scale and the S413 point (see 
Table II or Example 3 6) conesponds to jl + 5 Thus 5 can be obtained by 
subtraction In particular we find that ft = 21 6 and + 5 = 274 Thus 
<7- - 274 -21 6 a 5 5 see that the mean and standard deviation 
obtained by graphic methods are very near those obtained by numerical 
methods in Example 3 10 

The reader should be cautious m drawing conclusions when fitting 
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normal curves to data. Even though the fitted curve may very nearly fit the 
data, as illustrated by Examples 3.10 and 3.11, it does not necessarily follow 
that the sample was drawn from the indicated normal population. 

The above examples were given to familiarize the reader with the normal 
distribution and to indicate some of its many uses. As the theory develops, 
we shall consider other uses of this distribution. At this time we give some 
reasons why the normal distribution is so important in the theory and appli- 
cation of statistics. The reasons are 

1. Distributions encountered in practice are frequently believed to be 
normal or approximately normal. 

2. The mathematics is highly developed and relatively simple. 

3. It is important as a limiting distribution. 

3.3.4. Exercises 

3.12. Let 

jc, when —2 < x < 3 
^ jo, otherwise 

Determine c so that /(x) is a density function. Then find the distribution 
function and graph both functions. 

3.13. Let /(x; &) = \IQ, 0 ^x^0, be a one-parameter family of distri- 
butions. Graph the density curves for 0 1 ; 2. 

3.14. (a) The variable x is uniformly distributed over the interval 0 < x ^ 12. 
Determine the density and distribution functions of x and graph both, 
(b) Determine the mean and variance of the variable x. (c) Assume that 
X represents the position of the hour hand on the face of a clock in a jewelry 
shop which strikes the hour every time the hand reaches 1 2. Every day 
at a time determined by chance an individual aimlessly walks about 
town and stops in this shop. On what proportion of his trips would 
he arrive at least 20 minutes before the clock strikes? 

3.15. Let /(x; 6) = cjl, {9 — 2) ^x <{,6 + 2), be a one-parameter family 
of distributions, (a) Determine c so that f{x', 0) is a density function, 
(b) Find the mean and variance of x. (c) Graph two density curves from 
this family of distributions. 

3.16. Prove Formula (3.26), using Theorem 
3.2 and Formula (3.25). 

3.17. The variable x has density function ^ 

/(a:)=:1, (/: - ^ a: ^ (A: -P .^), 

where k is an arbitrary constant. For 
k = 10 in. with what probability )/3 

would we obtain the particular 
sample 70.4, 69.7, 70.1? Explain. 

3.18. It is known thdt x is distributed as 



3/2 



IS indicated by the widc-Iuie graph at the bottom of page 83 (a) 
Determine the density and distntmtion functions for x (b) Find the 
mean and variance of this distnbution (c) Find the proportion of * 
values between ^ and 1 

3 W. The triangular denttty fanetion is giww by 

= for(t-AI<o 

(a) Find the mean and variance of this distribution (b) Graph the density 
curve for a = 2 and A =» 0. fora = 1 and A = J (c) Find the distribution 
function and graph for one particular curve, say, hr a = b~ I {d) 
When a « 6 = I, find the probability that x falls between 0 8 and i 3 
SJO Find e so that /{x) *eJt. 0£JcS>, « a density function, and then 
find the mean and variance of the distribution of x 
3J1 Use Theorem 3 4 to find the transformation which transforms the dis- 
tribution in Exercise 3 20 into the uniform density function /(u) *= 1, 

3^. (a) Use Theorem 3 4 to t/ansform the triangular distribution in Exercise 
3 into the unit uniform density function 0>) What values of u for the 
transformed distribuhon correspond lo (he valuesxs — J, -07, -03, 
0, 0 3, 0 7, 1, when (he (articular triangular density function f{x, I, i) 
IS Used? 

3 23 Assume x to be normally distributed with mean 4 and standard deviation 
3 Find 

(a) the proportion of value* greater than x - 7, j Ch Ffx > 7] 

(b) FIl S * < 

(c) n-2^x^T) 

(d> FJO < X < 51 

3J4 Prove each of the following (a) Formula (3 34) (b) Formula (3 35X 
(c) Formula (3 36) 

3,25 If the mean lifetime of a certain kind of battery IS 600 days, with a standard 
deviation of 50 days, what percentage of this type of battery can be 
expected to last anywhere from 500 to 800 days? Assume that the life 
times are normally distributed 

326, The bacteria content of a canned food product must be less than 70 
to be acceptable Long experience indicates that the mean bacteria 
content is 68 with a standard deviation of 0 9 What proportion of the 
cans must be declared not acceptable, if the bacteria content of the 
cans is assumed to be uormaHy distnbuted? 

3 27. A certain school with high standards requires that a student have a 
verbal score o« a given allege board examination m excess of S40 in 
order to be considered for admission It is known that the scores are 
approximately normally disUibuted with mean 490 and standard devia 
tion 80 What percentage of students taking the college board cxamina- 
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, tion would not be considered for admission because of a low score? 

3.28. A one in. bolt manufactured at a certain plant is rejected unless its 
diameter measures between 0.98 and 1.02 in. If the diameters of bolts 
made at this plant are approximately normally distributed with mean 
1.000 in. and variance 0.000049 sq. in. what percentage of bolts are 
rejected due to improper size? 

3.29. If X is normally distributed with mean and variance find 

(a) P[X>IJL + j;] 

(b) PKii 

(c) P[x >(/x + 1) or x<(li- 2)] 

3.30. If x: is normally distributed with mean 10 and variance 4, find a number 
Xo such that 

(a) PU < x,] = 0.05 

(b) P[12 < X < Ar„] = 0.10 

(c) P[x > IxoD = 0.20 

3.31. If X is normally distributed with mean 10 and variance 9, find the limits 
of a symmetric interval about the mean such that the area above the 
interval and below the normal density curve is (a) 0.90, (b) 0.95, (c) 0.98. 

3.32. If the grades in your class in statistics were assigned by using the normal 
distribution, how many would receive each of the grades A, B, C, D, F? 

3.33. (a) The grades in a certain school are A, B+, B, C+, C, D, F. What 
proportion of the students in a^ class would receive each grade if they 
were assigned by means of the normal distribution ? (b) What proportion 
of students in a school giving only the grades “excellent,” “good,” “pass- 
ing,” and “failing” would receive each grade if they were assigned by 
means of the normal distribution? 

3.34. Use data in Exercise 2.7 to prepare a new frequency table with intervals 
0-4, 4-8, 8-12, etc. (a) Compute the mean p, and variance &- for this data. 

Answer, p = 17.74; = 19.1977. 

(b) Fit a normal density curve to this data. How good is the fit? (c) Plot 
this data on normal probability graph paper, (d) Use (c) to estimate 
graphically the mean and standard deviation of the fitted normal dis- 
tribution. (e) Estimate graphically the standard deviation of the fitted 
normal distribution in (c) by finding p — 2a and p + 2a. 

3.35. Use the data in Exercise 2.8 to answer (a), (b), (c), (d), and (e) of Exercise 

3-34. Answer to (a), p = 34.50, a^ — 45.8082. 

3.36. Use the data in Table 2.4 to answer (a), (b), (c), (d), and (e) of Exercise 

3-34. Answer to (a), p = 6.256, a- = 0.3776. 

Hint. In computing the mean and variance, let 4.85 and 7.45 be the 
class marks of the lower and upper classes respectively. 

3.37. (a) Find c so that f(x) = ce"*, jc > 0, is a density function, (b) Find the 
mean and variance of x. 
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3 J8 (a) Find c so that /(*) “ jc ^ 0. a a positive integer, is a density 

function 

Hint Use the fact that 

J*** jre'*<fe*= a' 
when a IS a positive integer 
(b) Find the mean and variance of x 

3J59 (a) Show that the area under the CaweA>»<fewiry function /fx, m) = + 

(x - m)’]}*', —03 <x< eo.isofte Graph /Cx. 0) and the standard 
normal density curves using the same axes 

Note It can be shown that the moments above the first, defined in 
Exercise 3 4S, all turn out to be infinite and the mean is defined in a 
restricted sense 

3 40 fa) Show that /(x. ^ « (» + Six', is a densi^ 

function (b) Graph/fx, l)./(x, 2), and/(x, 3) (c)Find the mean and 
variance of x with density function (1 + ?)x*, ? > 0, 0 £ x £ 1 

3 41. A density function of x is defined by 

(o, otherwise 

(a) Find the distribution function and sketch its graph (b) Compute 
Fix < s/31 and F(s/3 <x < 2»/3) (c) Determine x, such that Fix < 
X,) as 0 05 (d) Determine x, such that PHn - Xo) < f/t + x»)J « 0 95 

3 42. We said after giving Eq (3 29) that Af(x. (l, tr) [or NO, 0, 1)J can be 
computed by numerical methods When r is small the standard normal 
distribution function N{f. 0. 1) = N(r) may be evaluated by using the 
senes expansion 

h- ) <”’> 

(a)Eva1uate/V(0 1)300^(02) usingthe first three terms m the parentheses 
and compare with the values found in Table II (b) Prove Eq (3 37). 
using the sen« expansion of e '■ * 

3 43 For large values of t, Eq (3 37) converges too slowly to be of practical 
Use For large values of I the following senes may be used 

«(»= 1 -«(/) ijl— + ,. ±^) (3.38) 

where R is numerically less than the last term considered (a) Evaluate 
W(2) and //(3X using the first four terms in the parentheses and compare 
With the values found m Table 11 Use R to determine the maximum enor 
m Af{2) and NO) accurate to four decimal places (b' Prove Eq (3 38) 
by repeatedly integrating by parts 
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3.44. A more useful method of computing Nit) for all values of t is given by 
a continued fraction expression due to Laplace. The expression is 

Nit) = 1 - nit) ■ 


Use Eq. (3.39) to compute NiO.2) and Nil). Compare these results with 
those found in Table II. 

Note. There are other numerical methods for computing tables of 
the standard normal distribution function. These methods are particularly 
useful when high-speed computing machines are available. 

3.45. Generally, moments of the first and second order are adequate for most 
practical and theoretical work. However, occasionally monients of higher 
order are useful. Thus, the kth moment about the origin of a continuous 
variable x with density function fix) is defined by 

K= r x^ fix) dx (3.40) 

^ 4/ — oo 

where /: is a positive integer. Sometimes it is desirable to calculate the 
moments of a continuous function hix) of x. Thus, the /cth moment of 
a continuous function hix) = h [of a: which has density function fix)] 
is defined by 

iil.n = f^J’‘ix)fix)dx (3.41) 

One particular function of importance is hix) = x — fi. Hence, sub- 
stituting /j = a: — /Lt in Eq. (3.41) and replacing lll-.x-n by /x*, we have the 
formula which gives the kth moment about the mean that is, 

f ix- fJ.)^fix) dx (3.42) 

(a) Prove that /Xj = /xj - (;x;)=. Note that /Xo = <r= and /x| = (l. (b) 
Derive a formula for calculating )X^ in terms of fll, [ll-u . . . , fXj, fll (c) 
Use the formula found in (b) to obtain /Xj, 1 X 3 , and [i^. 

3.46. Find the third and fourth moments about the mean, /X 3 , and for the 
(a) uniform variable with density function given by Eq. (3.23), (b) triangular 
variable with density function fix; 1, 1) given in Exercise 3.19, (c) variable 
with density function determined in Exercise 3.20, (d) variable with 
density function determined in Exercise 3.38. 

3.47. For many distributions, moments of higher order than the first or second 
are difficult to determine directly from the definition. An indirect method 
using what is known as the moment generating function (MGF) is often 
employed. The MGF is also very desirable to have for theoretical con- 
siderations. Thus, the MGF, MJj), of a continuous variable .r with 
density function fix) is defined by 

A^i(0 = r e‘^fix)dx 

U — eo 


t + 


1 




(3.39) 


(3.43) 



THEORmCO. tUSTMBUnONS 


CHAP i 


The MGF IS a function of I only— the subsenpt x being used to indicate 
the variabfe of the distnbuiion We assume that /(x) is a density function 
such that ^fs(0 converges for some values of / Expanding e“ in a power 
series substituting this tn Eq (3 43). and evaluating gives 


M,(t)= I + fi! f + 


<3 44) 

The coefficient of r*/fc! tn this expansion is the Ath moment about the 
origin Thus, if the MGF is knowm or can be found and can be expanded 
into a power senes which is convergent for sotne values of f, then the 
Ath moTOrt fil can be obtained by inspection For certain density func- 
tions /(x) the MOF IS of such a form that it is more convenient to find 
the Ath moment given by 


Ml 


(MMl 


(3 45) 


-3rL, 

denotes the Ath derivative of the MOF wub respect to r evaluated at/»0 
Mfff U should be pointed out that the MGF does not alivays exist. 
However, for most density functions used in practice, it does axist 
(») Prove Formula (3 44) (b) Prove Formula (3 43) 

3v43 (a) Show that (he MGF of the uniform density function defined la Ec[ 
(3 23) is 

S^L=j1 

Td-o" 

(b) Use Eq (3 44) to show that the Ath moment about the origin is 

1 

m 

(c) Use Eq (3 45) to find the Ath moment {tt 

(d) Find the MGF and Ath moment fii for the unit uniform density 
function /(*) » u(x, 0, IX 

(e) Find (St, and fi^ 

3 49 (a) Let u have the standard norm^ distribution with density function 

Show that the MGF of u, Af/i). Is e^ ’ 

Ilini Complete the square in the exponent of the integrand of Af^t) 
(b) Find the moments of the standard normal distribution 
3.50 (a) Find the MGF of the variable x with density function given in Exer- 
cise 3 37 (b) Find fit from the MGF 
3.51. (a) Show that the MGF of the vanaMe x with density function 
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where a is a positive integer and j: > 0, is (1 — /)-<“+'> (b) Find [lie and 
o-^ Compare with results of Exercise 3.38(b). 

3.52. In order to generate moments of the type given by Eq. (3.41), we generalize 
the definition of MGF given in Exercise 3.47. Thus, to generate moments 
of h{x), an arbitrary function of x, we replace x by hix) in Eq. (3.43), 
giving 

Af/.(x)(0= r fix) dx (3.46) 

If c is an arbitrary constant and h{x) is any function of x for which the 
MGF exists, then prove that 

M,ft,,,(0 = M„.,(c/) (3.47) 

and 

M;.,^,+e(0 = c'‘MA,,,(/) (3.48) 

Note. The MGF and the generalized MGF along with the properties 

of Eqs. (3.47) and (3.48) are very useful in the derivation of numerous 

theorems. We shall use these relations in Chap. 5 to great advantage. 

3.53. (a) Show that the variable x with normal density function n{x\ [i, cr) has 
the MGF given by 

M^(t) = (3.49) 

Hint. Use Exercise 3.49 and properties (3.47) and (3.48). 

(b) Use Eq. (3.49) to find [lu /xj, [li. 

(c) Find /is and fi^. 


3.4. MULTIVARIATE DISTRIBUTIONS 

The definitions and theorems of this chapter can be generalized to more 
than one variable. Thus, we denote a density function in two variables x and 
y i>yf(x, y) and one in n variables x,, x^, . . . , x„ byf(x,, Xj, . . . , x„). In the 
continuous case, /(x, y) represents a surface in three dimensions and /(x„ 
■^ 2 ) • • . 1 x„) represents a hypersurface in n + 1 dimensions. The volume 
under the surface f(x, y) and above the rectangular region determined by 
a < X < b and c <. y < d gives the probability that the pair of variables 
X and y fall in this rectangle. Generalizing Eq. (3.17), we say that the n 
continuous variables x,, Xj, . . . , x„ have a multivariate (joint) density func- 
tion f(xu Xz, ... , Xn) =/if the following conditions are satisfied 

(^) f(X}, Xi, ... , x„) is a single-valued nonnegative real number 
for all real values of x,, Xj, . . . , x„ 

■ ■ ■ >x„)dxt. . .dx„ = I 

la. fa, , X„) t/x, . . . dx„ 

= P[a, <Xi<bi,...,a„<x„< b„] 


( 3 . 50 ) 
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where P[fl, < x, < d>,. ,a.<x,< A,1 deriotes the probabihty 

wjih which X, falls between any two real values fli and b|, x, falls 
between any two real valuci a, and b,. . . x* falls between any 
two real values o, and A, simnltancously 

The student will not have any difficulty giving the defining properties for a 
multivariate density function/(xi, *», . x,) for n discrete variables 

The mullivariate (joinf) dntnbulton function F(x,, , r.) of the n con- 

tinuous variables Xi. , x, «s given by 

f(x, = 

where the dummy variables r„ r« replace x„ , x, in the integrand, 
f being a function of the upper limits Ti, , t. It follows that 

0^f(x. .x.)^l (3 52) 

for all values for which /l[x„ . x,) is defined 

Themeanfiiar'dvariancefffcfthevanable r,(/e 1.2, ,n)afefivcn 
by 

Ml * /.x,/(x„ ,x,. .x,)dx, dx, dr, (3 53) 

and 

f ^(r, - fii)y(x, .X, xjdx, dx, dv„ (3 54) 

\ new parameter cr,j involving a pair of variables x, and x,(f j), called 
the covariance of x, and (; y ~ J 2 ») is defined by 

= j dx, (3 55) 

When the variables x and y arc uncorrelatcd in a certain sense which is 
discussed at length m Chap 4 we say that the vanables are independently 
distributed For the moment we give the following definition 
The variables x„ , x, are independently distributed if and 
only if their joint density function /(t,. . x,) can be ex- 

pressed as a product of the marginal density functions /i(x,) 

./«(x,) for all values of x, , x, for which / is defined (3 56) 
The marginal density function /(x,) for r, (r = I , ri) is 
given by integrating /(x„ , x.) with respect to all other 

variables (x„ , x, „v,,, , x,) between the limits — oo ‘ 

and oo 

it should be observed that the vanables x„ , x, in the function /(xi. 
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. . , 3 Cn) are always independent in the usual 'mathematical sense, but they 
are not necQsszxWy independently distributed. Usually we say that variables 
which satisfy Definition (3.56) are independent in the probability sense. The 
illustrations and exercises should clarify the concept. 

Now we give brief descriptions of two bivariate distributions. Further 
properties of joint distributions will be discussed and illustrated in later 
chapters. 


3.4.1. The Bivariate Normal Distribution 

The bivariate normal distribution is just as important (or even more 
important) among bivariate distributions as the univariate normal dis- 
tribution is among univariate distributions. The bivariate normal density 
function n{x, y, Px\ P-u> p) — T) is given by 


fix,y) = «(x,y) 


1 


2;r(TxO-yVT— 




(3.57) 


where the parameters p.x, Ity, <r|, crl are the means and variances of x and y, 
and p, called the correlation coefficient, is defined by 


o- = (3.58) 

<Tx<Ty 

If we defined a new function n*{x, y) obtained from Eq. (3.57) by replacing 
P-x, Pv, a-x,a-y, p and lUfLna-xO-y VI — p^) with a, b, c, d, e, and / respectively, 
it can be shown that, in order for n*{x, y) to be a density function satisfying 
Eqs. (3.53), (3.54), (3.55), and (3.58), the constants a, b, c, d, e, f must, in 
fact, have the specific values given in Eq. (3.57). This will be left as an exercise 
for the student. 

It is informative to study the nature of the density surface. We note that 
the density function n{x, y) is constant when the exponent of e in Eq. (3.57) 
is a positive constant k, that is, when 


(^)' - + i^)' = 2{1 - p’)k ' (3.59) 

The points satisfying Condition (3.59) lie on an ellipse with center at the point 
(Px, Py)- The location of the major axis of the ellipse depends on a-x, a-,,, and 
p. The major axis has positive slope when p > 0 and negative slope when 
p < 0. In case a-x = a-y and p^O, the slope is either 1 or — 1. When p = 0, 
the major axis is parallel to the x axis if o-* > a-y and is parallel to the y aixs 
if ^x < a-y. When ax = ay and p = 0, the ellipse reduces to a cricle. The 
ellipse (3.59) is known as a contour ellipse. It indicates the type of scatter of 
points (x, j;) taken from a bivariate normal population. 
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For a given bivariate normal density function, as k becomes larger, the 
corresponding contour ellipse becomes larger, but the value of r(x, y) 
decreases Figure 3 14 shows the normal desity surface for p > 0 along with 
two contour curves, indicating the relation of k to the surface The contour 
ellipses for the bivariate norma) distribution are analogous to the limits 
± f.<r for the univariate normal distnbution Thus, it is possible to find 
a ki. such that the volume under the surface and inside the contour ellipse 
is !00(I — 2a) per cent of the total volume 


'F 



Fig 3 14 Bivariate Normal DciKity Surface with Contour Ellipses 
Letting 


u = 5 — lU and v = 

<r, 0-, 

we find that Eq (3 57) reduces to 

J pt (3 60) 

Since dxdy ^aitr^dudv This is called the standardized bivariate normal 
density funr^ m It involves or« vasvablie p and has bttTi 

tabulated byV^ ven 117] 

Observe thaliwhcn p = 0 the standard normal density function /(u, v) 
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factors into the product /(«)•/( v) of two standard density functions; that is 


_ 1 _ 

2ff 


e ^ 




The student should be, cautioned at this point. Even though the variables 
in a joint normal distribution are independent when p = 0 this is not gener- 
ally true. However, if variables are independently distributed, it necessarily 
follows that p = 0 (see Exercise 3.65). The distribution of u for any fixed 
the same, no matter what value of v is selected; that is, the distribution 
\'ju is independent of v. Likewise, we see that the distribution of v is inde- 
J^ndent of u. Thus, it is natural to say that and v are independently 
distributed. 


3.4.2. A Discrete Bivariate Distribution 

We discuss a discrete distribution in order to show how the properties 
in Sect. 3.4 for a continuous variable may be used as guides in the discrete 
case. Consider the following discrete bivariate function 

' /(1, 1) = 0.1 
/( 1 , 2 ) = 0.1 
/(I, 3) = 0.2 

/(^.>’)= /(2,1)=0.1 (3.61) 

/(2,2) = 0.2 

/(2, 3) = 0.3 

0 for all other pairs of values of x and y. 

First, we note that this is a density function, since (1) /(x, y) is single-valued 
and nonnegative, (2) 

2 2/(^.J’)=l 

I »1 1/«1 

and (3) the sum of any subset of these six functional values gives the proba- 
bility of obtaining the corresponding pairs (x, y). For example 

i: i/(x, y) =/(l, 2) -1-/(1, 3) = O.I -h 0.2 = 0.3 

x«l y«2 

which is the probability of obtaining the pair (1, 2) or (1, 3). 

We may think of the bivariate distribution function F(Xr, y,) (r = 1, 

• k; s = I, 1; Xr < Xr+i; y, < for the six discrete points of 
Eq. (3.61) as given by 
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f(x. = =l)=/tl.!) =01 

Fil,2) «/(l. !)+/(}. 2) =02 

=/ri.I)+/(1.2)+/(l.3) =04 

F(x„y.) =/(l,l)+/(2.1) =02 


, f(2 2) =y(«.l)+/(«.2)+/(2,l)+/(2,2) = 05 

[f(2.3) =10 

(3 62) 

However, in mathematical considerations it is customary to define the dis* 
tobution function F(x, v) for all pairs of real values of x and y Thus, in 
general, we define the bivariate dutributton /melton F{x, y) as follows 

1 0, when x < Xi or y < yi 

F{xr,yi), when x, Sx<x,., and y,^y<}„%,tt 

being understood x».> = oo and » oo (3 63) 
and that x, and y, do not appear together 
1. whenx^x, and y'^yi 

Tot example using the demsty function (3 61), we obtain 
0. when x < 1 or y < 1 
f(i. 1) whenl^x<2 and l£y<2 
f(l 3). whenl^x<2 and 3Sy<eo (364) 

etc 

I. when * ^ 2 and y ^ 3 

The means variances covariance and correlation coefficient for the 
density function (3 61) are obtained as follows 

= 1 /( 1 . 1 )+ 1 /( 1 . 2 ) + 1 /( 1 , 3 ) + 2 /( 2 , 1 ) + 2 /( 2 , 2 ) + 2 /( 2 , 3 ) 

= 1 6 

ft, = = 2 3 

- 2 2 (x - 1 F)yix.y) = = 0 288 

<rj= =0 610 
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= 2 i: - 2-3)/(^. y) 

= (-0.6)(-1.3)/(l, 1) + (-0.6)(-0.3)/(l, 2) + (-0.6)(0.7)/(l, 3) 
+ (0.4)(-1.3)/(2, 1) + (0.4)(-0.3)/(2, 2) + (0.4)(0.7)/(2, 3) 

= 0.020 


and 

0.020 

P ~ V(0.288)(0.610) 

I 

Thus, it follows that the variables x and y are not independently distributed 
since p ^0. 

3.4.3. Exercises 

The first exercises are intended to lead to a better understanding of the 
ideas presented in Sect. 3.4; the last refer to all topics of this chapter, as 
well as to certain related topics. It is to be understood that /(x, y) = 0 for 
all pairs of x and y values not mentioned in the exercises. 

3.54. Let the joint distribution of the two variables 0 x < 1 and 0 ^ y < I 
have the joint density function /(x, y) given by /(x, y) = 1 . (a) Find the 
joint distribution function F(x, y). (b) Find the means, variances, and 
covariance, (c) Calculate F(l, 1); F[x < 1]. (d) Calculate F[(x + y) 
< 1]; F[2x > y]; F[(x= + y=) < 4]. 

3.55. Let /(x, y) = x^, y^O. (a) Find c so that /(x, y) is a 

density function, (b) Find F(x, y). (c) Calculate F(l, 1); F[x < 1]. 
(d) Evaluate F[(x + y) < 1]; F[x > y]. (e) Find the means, variances, 
and covariance. Note that x and y are independently distributed, since 
f(x,y) =/(x)*/(y), where /(x) = e"^and /(y) = e~'" are marginal density 
functions. 

3.56. Let /(x, y) = c, 0 < x < 1 , 0 < y :< x. (a) Find c so that /(x, y) is a densi- 
ty function, (b) Determine the means and variances, (c) Find F(x, y). 

3.57. Let /(x, y) = 10,000, 0.49 < x < 0.50, 0.50 < y < 0.51. Suppose we in- 
terpret X to be the diameter in inches of shafts made by one machine and 
y to be the inside diameter in inches of bushings made by another machine. 
If we assume that the bushing fits the shaft satisfactorily when the dif- 
ference in their diameters is between 0.0016 and 0.0064 in., what pro- 
portion of the shafts and bushings will fit? 

3.58. Give an illustration in which the joint density function in Exercise 3.55 
might be used as a model. 

3.59. (a) Find c so that /(x, y) = cxy, 0 < x < 1, 0 < y < x, is a joint 
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density function (b)Findftx,y) (c) Compute the means and variances 
(d) Calculate F(I, 1). P[x 4J 

3 60 Let the joint density function of the two vanables x S 0 and y k 0 be 
given by f(x, y) =» «>?*•**•' (a) Find the value of e (b) Calculate 
Fix < \,y< IJ Plx < I) (c) Find Fix, y) 5f possible 
3 61. Consider the following discrete bivanate distribution 

f 2c, whenx=l and 1,2,3 
e, when x* 2 and >**1,3 
4e, when je = 2 and y~ 2 

(a) Find c so that f[x, y) is a density function (b) Compute the means, 
variances, and correlation cocffiaent 
ffoie Marginal density functions /(x) and /(» do not exist such that 
/(x y) = f(x) /(y), even though p = 0 
(c) Determine f (jc, y) and find F(1 5. 2 S) 

3 62 Consider the followtng discrete joint distribution 

Al. M) » At. i. I) « Al. 2> 2) = c, 

AU ». 2? * A2. 2> « /<2. 2. 2> ar. 

A2.1,l)»A2.2.1)»3c 

<a) Find c so (hat Axi. x, x,) is a density function (b) Compute (he 
means and variances (c) Compute the covariances and correlation 
coefficients (d) Rnd fd. I. 2) F(l. 2. I). F(2, 2. I), fXl. t 5. 2) 

3 63 Ut 

Ax y, r) x20. y^O, ti:0 

(a) Find e so that /(x y i) ts a density funaion (b) Find F(x, y, *) (c) 
Find means and vanances (d) Determine the covariances 
3.64 Assume that the heights x and weights y of adult human males are dis- 
tributed with joint normal density function n(x, y, 6S, 160. 3, 20, 05). 
(a) Write the expression for the joint density function (b) Determine 
the function of y obtained by fcttiog x *= 68 It this a normal function'’ 
Is It a normal density function? (c) Find (he major axis of the contour 
ellipse 

3 65 If X and y are independently distributed with density function Ax. >) 
prove that p = 0 

3 66 The definitions and properties of moments and moment generating 
funaions for discrete vanables are similar to those given for the con- 
tinuous case in Exercises 3 45. 3 4T, and 3 52 U) Show that the MGFof 
a vanable having the Poisson distribution with mean fi is given by Af,(t) 
= e'"' " (b) Using Eq (3 45), show that the mean and variance are both 
equal to jt 

3 67. A function which provides a generalization of the factorial and which 
IS used extensively m distnbution theory is given by the defiiute integral 
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r(a)= rx°-^e-=‘dx, a:^0 (3.65) 

*/0 

This is called the gamma function, (a) Show that PCo: + 1) = o:r(Q:). 
Hint. Use integration by parts. 

(b) If A: is a positive integer less than a, show that FCo: + 1) = (X(ci — 1) 
. ..{a- k) Ha - k). Further, if a = n is an integer, it follows that 
Y{n + 1) = n! (c) In statistical application a is usually an integer or a 
multiple of one-half. Thus, if we knew Hi), we could compute r(a -f 1) 
for almost any value needed. Show that FC-j) = \fn. (d) Find r(l),r(|-), 
r(2),r(|),r(3). (e) computing Ufa -H) by using the above relations is 
impractical for large values of cl. Thus, log r(/i +1) has been extensively 
tabulated. For very large w, the following approximation, known as Ster- 
ling's formula, can be used: r(at -t- 1) = cr ! == \/2^ Q;<r+{i/ 2 ) g-<t The 
approximation gets better as cl increases. Use logarithms and Sterling’s 
formula to find r(lOl) and Ff^). 

3.68. The density function f{x; CL, /3) for the gamma distribution is given by 


/(jc; CL, yS) = Jc>0, /3>0, a> -\ (3.66) 

Changing /3 changes the scale of the axes. Thus, letting y = xlfi, we obtain 


f(y, a) = ^ y>0, ay- -1 (3.66a) 

which may be used as the gamma density function, (a) Prove that f(x; 
cc, 0) is a density function, (b) Show that the MGF of the variable x with 
density function (3.66) is given by Mj(0 = (l — provided t< 

1/yS. (c) Show that fi = {a+ 1)0 and cr- = (a 4- 1)0^ (d) Show that 
the distribution function F(x; a) is given by 


where 


Fix; a) 


F(x; a) 
F(a) 


F(x; a) — f e"’' dy 
Jo 


The function Fix-, a) = F(x; a) is called the incomplete gamma function 
and has been extensively tabulated by Karl Pearson [22]. We shall show 
later (in Chap. 7) that the very useful chi-square distribution is an incom- 
plete gamma distribution. , 

3.69. A function which is important in distribution theory is defined by the 
definite integral 

Bia, 0) = r‘ x“-' (1 - ;c)®-> dx (3.67) 

Jo 

where a and 0 are constants greater than minus one. This is called the 
beta function, (a) Prove that Bia, 0) = Bi0, a), (b) Prove that 


Bia,0) 


_ F(q:)F(;8) 

IXcL F 0) 
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fimt Let X = >» in Eq (3 <5), «nle Ua) IXff} as a double integral, 
and then transform to polar co ordinates, following the pattern of evalu 
atmg A* in Sect 3 2 2 

370 The density function /(a, a,fi) for the beta distribution is given by 

0<x<U a>-l, /3> -1 


(a) Prove that/{x,cf.>3) is a density fuiKtion (b) The MGF of the beta 
distribution does not have a simple form, but the moments fii can be 
found directly Show that 




rftf + ff)rfg + -ir+ 1) 

tV+’/ir+V+ 2)rf<r + }> 


Thus, determine ft and <r* (c) Show that the distribution funaion F(x, 
a, ff) IS given by 




g(x. g, g) 


where 


e(x.a./3) = f*i‘(l ^tydr 0<x<l 

The function F(x, a,0)a /(x a, ff) « called the incomplete betafunetm 
and has also been extensively la^laied by Karl Ftinon {2]] We shall 
show later (m Chap 9) that the lery useful F distribution is an incomplete 
beu distribution 


371 It IS desirable to have a reasonably simple mathematical expression 
which can generate models for most of the distributions found in applied 
statistics Karl Peanon (18. 19. 20) has proposed the differential equation 


df , (X + a)/ 

'3x S + rx + «&’ 


(3 69) 


for this purpose This equation does have a large family of solutions 
which includes many distributions found in practice By letting the 
constants a, b, c, <t assume different relations to each other, Pearson 
classified the solutions of Eq (3 <19) into 12 families of curves, those of 
one family being called Type I curves those of a second family being 
called Type II curves, etc The beta distributions represent Type I and 
Type U (when <ar = >9) curves, the gamma distributions are Type lH 
curves, and the normal dotributions are Type VII curves See Craig 
(8) Elderton (lOJ, and Kendall (14) for complete accounts of the 
Pearson types (a) Show that when a = -ft. b = o’, and c = d^0, 
the solution of Eq (3 69) «/« /i(x (s.o) (b) Determine the solution 
of Eq (3 69) when a = 6 = d e= 0 (c) Determine the solution of Eq 
(3 69) when a == b ^ c = 0 

Sou I U can be shown that the constants a, b, c, d m Eq (3 69) 
can be expressed sn terms of the first four moments Thus, only the 
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first four moments are required in order to specify any density function 
satisfying Eq. (3.69). Thus, an approximation to any density function 
in the family may be obtained by computing estimates of the first four 
moments form a sample of observations. 

Note 2. Other methods for expressing a large family of distributions 
in simple form have been presented. Among those the best-known method 
is called the Gram-Charlier series, which states that under fairly general 
conditions a wide class of density functions f(x) may be expressed as an 
infinite series of terms made up of the normal density function and its 
derivations. That is, f(x) — UoWoCO + OiUiiO -t- -I- . . . , where 

flj are constants, t = {x - [L)jcr, and nit) is the /th derivative of the 
standard normal density «(/; 0, 1) = nlf). For more on the Gram-Charlier 
series see Refs. [2, 5, 6, 7, 8, 15, 16]. 
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PROBABILITY 


The concept of probability is treated from the relative-frequency point 
of view. Some rules for computing probabilities are discussed. 

4.1. INTRODUCTION 

In the earlier chapters we have been concerned with statistical distribu- 
tions from a descriptive point of view. We have indicated that distributions 
found in practice may be represented by simple mathematical forms which 
can be characterized by a few parameters. So long as these parameters are 
known and the nature of the distribution is known, the mathematical expres- 
sion of the distribution can be explicitly given. However, these parameters 
are seldom known, and we are faced with the question, “How can we deter- 
mine which mathematical expression is correct (or most appropriate) when 
only a sample of the population is available?” Clearly, we would not expect, 
on the basis of the information obtained from a sample, to make any 
statement concerning the nature of the distribution of the population with 
the same certainty as we could if we had the whole population. But it is 
possible to make statements with less certainty in terms of probability. 

4.2. DEFINITIONS OF PROBABILITY 

In Chap. 3 we introduced formal properties of “mathematical probabil- 
ity” (formerly called “probability”) of real numbers assigned to events, 
outcomes of experiments. There we did not discuss probabilities of events 
or the problem of assigning a meaningful probability. For example, in the 
axiomatic definition of probability given in Eq. (3.2), each of the assignments 
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(and millions more) of ptobabiliiy m Table 4 1 to a number on the face 
of a balanced die is satisfacioiy However anyone who has to use the results 
of experimentation to take action knows that the four cases arc not equally 
satisfactory Further most investigators would prefer to use results of 
previous experiments along with certain assumptions to estimate and assign 
probabilities to events Thus in this chapter we give an operational defini 
tion of probability as contrasted with mathematical probability of Chap 3 
which IS very useful in spite of its obvious shortcomings 

It can be shown that the basic properties suggested by end denved from 
the frequency approach to probability arc like those obtained from the 
axiomatic approach to mathematical probability Furthermore, such an 
proach allows for an interpretation of probability which is famthar to the 
experimenter and at the same time satisfies the properties of maihemaiica! 
probability From this point on wc shall think of the relative frequency 
concept of probabitiiy of evenis as being a special meaningful interpretation 
of mathematical probability assigned to real numbers associated with events 
We discuss the relative frequency concept of probability m terms of 
evefiis understanding that the x used m Chap 3 is a real valued function 
defined over all the events of an expenment Hence the relative frequency 
approach to probability is easily expressed in terms of x that is in terms 
of numbers assigned to events An investigator in a particular experimental 
Situation usually knows what number to assign to an event Hence he would 
have no difficulty m changing probability statements about events to prob- 
ability statements about real numbers x 

It IS not easy to give the term probability a precise meaning which would 
satisfy everyone Some would want the term to be characterized so as to 
include all the ideas generally associated with the word probability Most 
statisticians are concerned with the term as it « used in a particular sense 
or example jhe) iiant io know how sample means or sample standard 
deviations vary from sample to sample if it is assumed that samples of the 
satne Size are drawn from the same popuUtion These facts and others 
lead to disagreement as to how best to characterize the term We shall take 
the somewhat restricted point of v,cw of many statisticians and think of 
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probability of an event as the “limit of the relative frequency with which 
it occurs.” 

In order to arrive at a clear understanding of what is meant by “limit 
of the relative frequency,” we first consider two illustrations and a very 
restricted intermediate interpretation of probability. 

Example 4.1. Consider the simple experiment of tossing a coin. We 
assume that the only two meaningful outcomes are the occurrence of a head 
H or the occurrence of a tail T, and that they are mutually exclusive, since 
both sides of the coin cannot turn up simultaneously. Of course, after a toss 
it is conceivable that the coin might stand on edge or vanish, but we speak 
of the occurrence of a head or a tail as the only two possible outcomes. 
Furthermore, we assume the coin to be well balanced and to be tossed fairly. 
Thus, on a given toss we would say that there is nothing to favor the occur- 
rence of a head more or less than the occurrence of a tail (that is, the two 
possible alternatives are equally likely) and that the chance of either occurring 
is If obtaining a head is called event E, we say that the chance of event 
E is 

Example 4,2. Consider the chance of drawing a certain type of card from 
a pack of 52 playing cards. We assume that we have the same chance of 
drawing any one of the 52 cards and that we cannot draw more than one 
card in one draw; that is, the 52 possible alternatives are equally likely and 
mutually exclusive. Thus, if the pack is well shuffled and the selection of a 
card is random, then the chance of getting one particular card is The 
chance of getting an ace in one draw is . 5 ^^ = since there are four such 
cards in a pack of 52. Furthermore, the chance of getting a heart in one 
draw is -s-i- = T- obtaining a heart in one draw is called event E, then the 
chance of event E is 

These two illustrations lead us to the following classical concept of pro- 
bability of events for simple experiments; 

If an experiment can result in N mutually exclusive and equally 
likely possible outcomes, n of which correspond to the occurrence 
of an event E, then the probability of the event E is n/N, or briefly 
P[£] = n/N. 

Note that, even though the term “equally likely” is a primitive or indefinable 
term applied to an experiment, this approach to probability may be criticized 
as being circular. In such case, the reader may prefer to think of “equally 
likely” as being synonymous with the term “equally probable,” and, hence, 
think of probability of an event E as being expressed as the sum of n equals 
probabilities {[N assigned to a set of N mutually exclusive and exhaustive 
experimental outcomes. 
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The classtcal interpretalion certainly agrees with the usual concept of 
probability as applied in simple ctpenmenls, but jt leaves something to be 
desired m many more complex expcnmciws (or investigations) For example, 
this approach does not apply when it is impossible to count the number of 
equally hVely outcomes, such as the number of cars on the Merntt Parkway 
tomorrow or the number of people who will die from cancer in the twentieth 
century Thus, the classical concept is inadequate in determining the proba- 
bility of a light bulb s being defective or of an unborn baby’s "^ing female 
or of an American’s living more than one hundred years Furthermore, it 
may be that the outcomes m an experiment arc not equally likely For 
example a com may be biased in favor of “heads’* or a die may be “loaded " 
Hence, we sense the need for another practical concept of probability — one 
that docs not depend on a complete <i priorf analysis This suggests an em- 
pineal approach 

Suppose that m a sequence of n trials of a specified experiment an event 
£ occurs n, times The ratio n,/n is called the relative frequency of the event 
£ and is denoted by £(£] Then the prtAabtlity PJE] of the event E u the 
limit approached by /?(£) as n inereaaes uldefinue^y^ it being assumed a limit 
value exists We call this the refaitve-fretiuency concept of probability Such 
a definition is an operational definition a type often used in science and 
engineering and is useful because it allows empincat estimates of probability 
to be obtained from relative frequencies (It should be noted for this inter- 
pretation of probability that an operational limit is not the same as the usual 
mathematical limit The following three examples and other material deve 
loped later should make the distinction increasingly clear) 

It IS clear that the relative frequency concept of probability is based on 
observational evidence, and that the classical concept is not For example, 
according to the classical concept we say that the probability of obtaining 
a head in a toss of an idea) com is 4 without even tossing the com, but accord 
jng to the relative frequency approach we would need to toss the com many 
times before arriving at a satisfactory estimate of the probability With this 
fact in mind, we use the results obtained from several trials to lake a closer 
look at the meaning of probability as given in the relative frequency inter- 
pretation 

Example A quarter, believed (o be ideal was tossed 200 times with 
the following results (reading across rows) 

HHTHHHHTHHTHHHHHTTHTHTHHTHHTHT 
TTTHHTHTHHHTTHTTHHHHHHHHTHTHTH 
TTTTT hhthtttthhtththht tthhhhkft 
hhthththhhthttttththhthhhhhtth 
tththhhttththhthhhthhthttttttt 
htthhttththhhttthhhtthtthtttth 
htthhhtttthhhthtttth 
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We will not say whether 200 tosses represents “many trials.” Perhaps it does 
not. If 200 tosses do represent many trials, then, according to the relative- 
frequency concept, the probability of event E, occurrence of a head on a 
single toss, is nearly ^ = 0.52. To illustrate the effect that increasing « has 
on the relative frequencies R[E], we compute the ratio of heads after the 
first 20 tosses, the first 40 tosses, etc. to obtain i?[£] = ^ = 0.70, 0.63, 
0.63, 0.56, 0.59, 0.57, 0.57, 0.54, 0.53, 0.52. Thus, we see that as the number 
of trials increases the corresponding ratios fluctuate less and less and usually 
get closer and closer to 0.50, which is the assumed correct probability. 

Note. We computed ratios after each 20 new trials in order to see the 
nature of the fluctuations in the sequence without having to look through 
so many terms. Selecting ratios at regular intervals could lead to wrong 
conclusions, for it might happen that we would get a sequence of large (or 
snlall or unusual) ratios. 

There is another sense in which we may think of the relative-frequency 
interpretation and which gets us closer to the way the statistician usually 
thinks of probability. 

Example 4.4. Suppose we toss a quarter and record the number of heads 
(a) every 20 tosses for 15 times, (b) every 200 tosses for 15 times, and (c) 
every 2000 tosses for 15 times and obtain the results shown in Table 4.2. 


Table 4.2 

Number of Heads per Group 


Group 

20 

Number of Tosses 
200 

2000 

1 

14 

104 

1010 

2 

11 

91 

990 

3 

13 

99 

1012 

4 1 

7 

96 

986 

5 

14 

99 

991 

6 

10 

108 

988 

7 

11 

101 

1004 

8 

6 

101 

1002 

9 

9 

101 

976 

10 

9 

no 

1018 

11 

9 

108 

1021 

12 

6 

103 

1009 

13 

6 

98 

1000 

14 

10 

101 

998 

15 

13 

109 

988 


The percentages of heads are shown in Table 4.3. In the groups with 20 
tosses the percentage of heads varies (at random) between 30 and 70; in the 
groups with 200 tosses it varies between 45.5 and 55.0, and in the groups with 




fig 4 t Dot Frequency Diagrams for Percentage of Heads 

Note It may take many more than 15 groups before the fixed value can 
be approximated with sufficient accuracy Fewer groups are required as the 
number of tosses per group increases 
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We have observed with the coin-tossing experiment that relative frequen- 
cies may be used to estimate probabilities from observational results. The 
relative-frequency concept of probability may also be used with more real- 
istic data. For example, if we found that 2540 peas out of 10,000 of a certain 
strain were pale green or yellow, we would estimate the probability as 
0.254 = 0.25 that peas of this strain are pale green or yellow; if a worker 
examined 8250 machine parts during the day and found 174 defective, we 
would estimate the probability as 0.021 that a part which the factory pro- 
duces is defective. 

In each of the above illustrations the variable is discrete. Now we consider 
an example in which the variable is continuous. 

Example 4.5. Assume that we have a roulette wheel with a pointer which 
is equally likely to stop at any position (point) of the wheel. (Other circular 
objects, such as the face of a clock with the minute hand as pointer, could 
be used.) Using the classical concept of probability, we would say that 
the probability that the pointer falls within a given interval on the scale of the 
wheel is equal to the ratio of the length of the interval to the length of the 
whole scale. Now if we think of the length of the interval converging to zero, 
then, according to the relative-frequency concept, the probability of the 
pointer's stopping at a given point selected in advance of the experiment is zero. 
Thus, we see that even though the probability is zero that the pointer will 
stop at any single point, selected in advance, the event if regarded as 
accurately definable is not an impossible event. 

This example indicates that the classical concept of probability can be 
applied to a certain type of problem with a continuous variable, particularly 
when each possible outcome is assumed to be equally likely. However, there 
are some difficulties in relating empirical results to true probabilities. For 
we have that old problem, common to all mathematical theories of physical 
phenomena, of passing through an infinite number of steps. In addition to 
this, we have another problem peculiar to probability theory. In the relative- 
frequency interpretation, we do not say 

lim i?[£:] = P[E] 

in the mathematical sense of limit; that is, we do not say that for an arbitrarily 
small positive real number e there exists a large n, say such that | R[E] 
- P[£] | < 6 for every n > n'. Instead, we say, operationally, that “the 
probability P{E] is the limit approached by Ji[E] as n increases inde- 
finitely.” Such a statement allows an element of uncertainty and thus, in a 
sense, is defined within the framework of the concept itself. Thus, even though 
the usual desirable properties of probability hold for the relative-frequency 
interpretation, we cannot rigorously derive them from such a definition. 

The relatively simple interpretations of probability given in this section. 
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even though they have some obvious limitations are adequate for most 
practical problems In situations m which these interpretations arc not 
broad enough to give probability of an event we simply assume that it is 
some fixed but unknown value which obeys the usual properties of proba 
bility given m Sect 4 3 (and Chap 3) 

There are other ways of interpreting probability, but ours has the advan 
tage of being simple For other more complete discussions of probability see 
the following books devoted to its theory fl 2 3 4 5 6 8 9 10 11, 
14 16 17] Halmos {7} gives an elementary discussion of a mathematical 
concept of probability and Catnap (2] lists (on 1 5 pages) a selected bibliog 
raphy 

431 fjiercrei 

4 1 What arc the extreme values of the protabitiiy of an event’ Give an 
illustration of each 

4J A balanced die is fairly tossed Let the event A denote the occurrence 
of an odd number ^dcnoteibeoccurrence of a i even number Citnote 
the occurrence of a number greater than 3 D denote the occurrence of 
a number less than 3 and E denote (he occurrence of (he number 3 

(a) Find P{A\ P{8\ PiO\ (b> Which of these events are equally I kely’ 
(c) Wh ch of these events arc mutually exclusive’ (d) Find the probability 
of event C or event D (e) Find the probability of event B or event C 

4J A card is drawn from a welt shuffled deck of playing cards (a) What m 
the probability of obtaining a jack or queen’ (b) What is the probability 
of not obtaining a heart’ (e) What is the probability of obtaining a heart 
or an ace’ 

44 Two balanced coins are tossed (a) List the equally likely occurrences 

(b) What IS the probability of obtaining two heads’ (c) Is the occurrence 
of a head mutually exclusive of the occurrence of a tail’ 

4 5 How would you estimate the probability that the acidity level of a sample 
of Virginia soil selected al random exceeds a pH level of 60 assuming 
Table 2 4 to be an accurate dislnbution’ What n this probability’ 

4 6 Using Table 2 5 explain how you would estimate the probability that 
a female fruit fly (of Ih » strain) caught at random (see Chap 5 for more 
about this term) has more than 20 bristles on the suth abdominal sternite 

4 7 Estimate the probability that (he excess jardage of 100 denier acetate 
yarn of a bobbin selected at random is less than 20 yards assuming that 
the fable m Exercise 2 7 » an accurate distribution 

48 How would you esUmste cxperimentaKy the probability of getting an 
appointment within (wo weeks widi a certain busy medical doctor’ 

4 9 A com may be used to connect the repeated trials concept of probability 
to the equally likely concept of probability Give an illustration in which 



SECT. 4.3. 


PROBABIUTY 


109 


the equally likely concept of probability seems most appropriate; the 
repeated-trials concept seems most appropriate. 


4.3. PROPERTIES OF PROBABILITY 

We now present rules for computing probabilities of compound or 
related events in terms of probabilities of simple events. For simplicity, 
we consider properties of probability involving only two events associated 
with an experiment, properties of probability involving more than two 
events being straightforward generalizations of properties of probability for 
two events. Furthermore, these properties are discussed and illustrated in 
terms of the classical interpretation, even though they hold in much more 
general situations. 

Let 7 denote every possible outcome of an experiment, and call it the 
universal event. If A is an arbitrary event of an experiment, then the event 
“not A” denoted by .4, is that event of the experiment in which A does not 
occur. A is sometimes called the complementary event of A. We now define 
other events of an experiment in terms of simple events A and B, using the 
connectives “and” and “or,” which are denoted by Pi U> respectively. 

Definition 4 . 1 . Let A and B be any two arbitrary events of an experiment. 

(a) The event “A and B," denoted by y4 P 5 or 5 P y4, is that event 
in which event A and event B occur together. 

(b) The event “/I or B,” denoted by Vl U B or B U is that event in 
which either A occurs or B occurs or both A and B occur. Note that the con- 
nective “or” is used in the inclusive sense of either or both. 

Example 4 . 6 . Let ten similar circular disks in a box be marked by the 
numbers 1, 2, 3, . . . , 10, respectively. A single disk is selected at random. 
Let A denote the event that the number is a positive odd integer less than 
11 ; that is, event A occurs if either I, 3, 5, 7, or 9 is drawn. Let B denote the 
event that the number is a positive integer less than 4; that is, event B occurs 
if either 1, 2, or 3 is drawn. Use these two simple events to illustrate the above 
definitions. 

Since the possible outcomes of the experiment are 1, 2, 3, . . . , 10, the 
universal event 7 occurs if either 1, 2, 3, . . . , or 10 is drawn. Thus, the com- 
plementary event A_ occurs if either 2, 4, 6, 8, or 10 is drawn, and the com- 
pjementary event B occurs if either 4, 5, . . . , or 10 is drawn. Clearly, A U 
A = I, and B\J B = I. 

Since events A and B have only the numbers 1 and 3 in common, A f] B 
IS that event in which either 1 or 3 occurs. Further, the event A [J B occurs 
if either 1, 2, 3, 5, 7, or 9 is drawn. 

Events can conveniently be expressed in “set” notation. For example, 
we may think of /I as a collection of numbers I, 3, 5, 7, 9 and write 
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A^[i.3,5.1,9) 

understanding that A is a “set" whose “dements** arc the positive integers 
I, 3, 5, 7, 9 In a similar manner we write 

/= {1,2,3, .10} and {1.2,3} 

So, in set symbolism, it follows that 

A = { 2 , 4 , 6 . 8 . 10 } 

B = {4.5.6.7,8.9,10} 

(1.3} 

/lU »=* {1.2,3.5.7.91 

The events "A and B” and ""A or B” arc called the Intersection of A and B 
and the union of A and B respectively (We may think of “set 4“ as the 
mathematical counterpxirt of the real-world "event A” and use them mter- 
changeably ) 

In practical applications the probability of a simple event P\A) say, 
IS usually taken to be an estimate obtained by computing the "apprepnate" 
relative frequency from observation data. or. in certain cases, it is a value 
obtained from the classical interpretation by taking into account a pnon 
considerations Since A,S, A f \ B, and A U S are events of the universal 
event / with simple events A and B, tl)C probabilities P\A], B\d 0 ^1« 
and P[A IJ 5) may be determined by the method used in determining J*IdJ 
P[B] However, it is usually desirable to compute such probabilities by 
applying p'-operties like those given by Eqs (4 2), (4 3), (4 3a), (4 4), and (4 4a) 
In Sect 4 2 we discussed the probability of an event m terms ofall possible 
outcomes m an experiment Now we consider the probability of an event in 
terms of some subset of all the possible outcomes, or, as is commonly stated, 
subject to the condition that we restrict the number of possible outcomes 
This concept is illustrated in the following example 

Example 4 7. Use the events defined in Example 4 6 to find the proba- 
bility that the number drawn is a positive odd integer on the condition that 
the number must be less than 4, that is, find the probability of A given that 
B has occurred 

When B occurs, the disk must be marked with 1, 2, or 3 In order for 
A to occur, if It IS known that ^has occurred, the disk must be marked with 
I or 3 Thus, the required probability is | 

Definition 4 2. If an expenment can result in N mutually exclusive and 
equally likely possible outcomes, /i{B} ^ 0 of which correspond to the occur- 
rence of the event B and n(A B) of which correspond to the occurrence 
of the event A if it is given that event B has occurred, then the probability 
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of event A given that event B has occurred, denoted is given by 


niA n j5)/n(5), or, briefly. 


fuTsi = 


I " 1 

j I 


More generally, for any two events A and B such that P[B] ^ 0, 

PjADB] 


P{A\B\ 


P[B] 


Note. The symbol | in P[A \ B] is read “given.” Thus, we read P[A | B] as 
“probability of event A given event B" and call it the conditional probability 
of event A. It should be obvious that the restricted definition is like the 
classical interpretation of probability. 

An understanding of terms like “mutually exclusive,” “exhaustive,” and 
“independent events” is useful. Definition 4.3 and Example 4.8 should make 
clear the meanings of such terms. 

Definition 4.3. (a) Two or more events £■„ E^, are mutually ex- 

clusive if the occutcence of any one precludes the occurrence of each of the' 
others. 

(b) A collection of events is exhaustive if it includes every possible occur- 
rence in the experiment. 

(c) The event E^ is independent of £« when 


P[£, I £,] = £[£,] 

Note that independent events are defined in terms of probability. 

Example 4.8. In addition to the two events A and B defined in Example 
4.6, we introduce the three new events 


C= (4,5,6,7,81 
£-(7,8,9,10) 

£= (3,6,9} 

all being events associated with the 10 disks in a box. Use these five events 
(subsets of I) to illustrate Definition 4.3. 

Events B and C are mutually exclusive. Events D and F are not mutually 
exclusive, since they occur simultaneously when the number is 9. 

The collection of events B, C, and D is exhaustive. Also, events B, C, 
and F make up an exhaustive collection. The probability of the occurrence 
of some event in a collection which is exhaustive is 1 . 

The event C is independent of event D, since the probability of the occur- 
rence of event C is the same regardless of whether we are restricted to the 
numbers of event D or not; that is. 
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J'!ClD) = =i^ = 4 = « 

and 

^ic)=1r = w = °' 

Further, we see that event D » independent of event C, since FID ) C] = f 
and P[D] *= -rV are the same In general when an event f, is independent 
of E„ It follows as will be shown later in the proof of Eq (4 2) that f, is 
independent of £, and thus we say that Ei and £, are Independent events 
Sometimes we say that £, and E. arc independent m the probability sense 
Note that A is not independent of B for P[A 1 B) = and P\A) = ate 
different Note also that B is not independent of A for P[B\A] = | and 
P[B] »= arc different Events which arc not ihdcpendent are said to be 
dependent exerts 

Since expressions involving simple events connected with “not," “and," 
or “or ’ are also events we could use the classical or the relative frequency 
concept to determine directly the probability of any such compound event 
However, it is usually simpler to determine the probability of compound 
or related events in terms of probabilities of simple events with the aid of 
one or more of the rules staled below (These properties of probabilities of 
events A and B are listed together for easy reference and they hold for any 


of the numerica! interpretations of probability ) 

(41) 

( 42 ) 

FMnBl = F(-4I PlBM) = Fiai (4 3) 

F[/4 D B) = FMl F{B1 if A and B are independent (4 3a) 

P[A U BJ = P\A\ + F(S) - F[d D BJ (4 4) 


P\A U B} — P(^J + F(B} if A and B are mutually exclusive (4 4a) 

After illustrating these properties we shall show how they follow directly 
from the classical concept of probability 

Example 4 9 Use .he events defined in Examples 4 6 and 4 8 to illustrate 
properties (4 2) (4 3) (4 3a) (4 4) and (4 4a) 

The event B can occur in three mutually exclusive ^nd equally likely ways 
and the event B in seven Thus PIB} = P(B] = , and 1 - FfBl * * 

So Eq (4 2) IS verified 

According to Example 4 8 . events A and B are not independent Clearly, 
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P[^].P[j5M] = (-^)-(l) = 0.2 and P[5]-PM j B] = (A)-(l) = 0-2- But 
4 n B is that event in which the number is 1 or 3, and hence by the classical 
interpretation, P\A C\ B] = — 0.2. This verifies Eq. (4.3). Note that 

P{AyP[B] = = 0.15 7^:PM n P]- Further, since C and Z>are inde- 
pendent events with P[C] = P[D\ = and P[C f) P] = fol- 

lows that Eq. (4.3a) holds (in this special case). 

The events A and B are not mutually exclusive. Thus, P[A] + P[B] 
- P[/l n P] = -iV + T% - A = A- But P[A U P] = A. since the event 
A\j B can occur if the number is 1, 2, 3, 5, 7, or 9. Hence Eq. (4.4) is 
verified. The events B and C are mutually exclusive, and B\J C can occur 
if the number is 1, 2, 3, 4, 5, 6, 7, or 8. Thus, P{B\J C] = A P[P] + 
P[C] = A + A = A> so Eq. (4.4a) is verified. Note that C and D are 
not mutually exclusive even though they are independent events. 

Property (4.1) follows directly from the classical interpretation. To prove 
Eq. (4.2), let n denote the number of mutually exclusive and equally likely 
possible outcomes of an experiment, n(A) of which correspond to the occur- 
rence of the event A; then n — n(A) of the outcomes correspond to the 
occurrence of the event A. Thus, by the classical interpretation and sub- 
stitution, P[A] = — = 1 - = 1 - P[Al 

n n 

By Definition 4.2 we have P[A\B] — n(A H P)MP) when n(B) ^ 0. On 
dividing both numerator and denominator by n, we have 

n(A n B/n _ P[A H P] 

/i(fi)/n P{B] 

Thus P[A I B] = P[A n P]/P[P], or P[A fl P] = P[B]-P{A \ B] when P[P] 0. 
If P[B] = 0, we define P[y4 f) P] to be zero, so the relation holds. The proof 
of P[A Pi P] = P[A\’P\B\A\ follows by interchanging A and B in the above 
proof. In case A is independent of B, we have P[A j B] = P{A] and on substi- 
tution in Eq. (4.3) this gives P[A p B] = P[B\'P[A\. If B is independent of 
A, we obtain P[A p B] = P[A]-P[B]. Since P[ByP[A] = B[/I].B[B], we see 
that the statements “B is independent of A” and “A is independent of B” 
^ad to the same result. Thus, we feel justified in saying A and B are inde- 
pendent events. 

If the events A and B are mutually exclusive, then the event A [J B can 
result in n(A) -f «(B) possible occurrences. Hence 

P[A U P] = 

n 

_n{A) , n(B) 
n n 


= P[A] + P[B] 
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If A and B are not mutually exclusive, then they have n(A fl 3) of the pos- 
sible occurrences in common Thus, the event -4 U ® result irv n(A) + 
n(e) - n(A n 3) possible occurrences In this case it follows that 

U fl) = 

« 2(d) + "(®) - ^) 

n n n 

« Pl4] + P[B] - PIA n B] 

Properties (4 3), (4 3a) (4 4> and (4 4a) arc easily generalized to more 
than tuo events See the exercises in Sect 4 6 for formulas and the examples 
below for illustrations ' 

Example 4,10 What is the probability of obtaining any sequence of 
heads // and tails T resulting from five fair tosses of a well balanced com’ 
By the classical interpretation the probability P\Il\ of obtaining heads 
on a given loss is PIH) = p c j and the probability P[T] of obtaining tails 
on a given toss is PIT] = Pfnoi //I - I - p =* q = i Since any two tosses 
are considered independent the probability of obtaining the particular order- 
ing HHTUT IS p p 9 p-g = J i i-J'i = jV This is a generalization of 
Property (4 3a) Since p = g s 'j this is the probability of obtaining any one 
of the 32 mutually exclusive sequences Thus the 32 possible sequences 
are mutually exclusive and equally likely 

Example 4 II. Thirty pumpkin seeds of the same size and shape, 9 being 
of strain A and 21 of strain B are placed in a box and thoroughly mixed 
One seed is drawn and the strain recorded and then a second seed is drawn 
and the strain recorded What is the probability that (a) both seeds are 
from strain A"^ (b) Ore seed^Wrom strain A and the other from strain B’ 
Let A, denote a seed of strain^ on the first draw and A, denote a seed 
of strain A on the second draw In wish to findB[seed of strain A 
on the first draw and seed of strain .4 g" second draw) or P[A , (~1 -4tl 
/I I and A, arc not indepintrnt. since llit probability of A, depends on the 
first draw Thus, by Eq (4 3) W. ijay^ 

PlA,nA,] = ^ = 0033 

since on the second draw there are t,jy ^ seeds' wTC 
altogether ^ 

fb solve (6) we must find /’(seed of ^ ^ d seed of strain B] or 
n -4,} IJ (A, n -4,)J, that IS, vre first ^ see&^ strain A and then 
one of strain B or we first draw a seed of strau then one of strain A 

Since A,C\Ai and d, O At are events which /mutually exclusive, we 
may use Eqs (4 4a) and (4 3) to write 
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pi(Ai n A,) u (i. n AS = p[Ai n a,] + p[a, n a,] 

= P[AA-P[A2\A,'\ + P{Ai]’P[Ai\A,] 

The method used to solve Example 4.11 illustrates how Formulas (4.3) 
and (4.4a) are applied in fairly complicated situations. However, in this case 
it is a reasonably simple matter to find these probabilities directly from the 
classical interpretation. To do this we must enumerate the total number of 
possible equally likely outcomes and the number of outcomes which are 
favorable to the occurrence of the event. The total number of possible out- 
comes is 30-29, since there are 30 possibilities on the first draw and with each 
of these there are 29 possibilities on the second draw. The total number of 
outcomes favorable to event Pi -^2 is 9-8, since there are only 9 seeds of 
strain A on the first draw and 8 of strain A on the second draw. Thus, 

0 / 12 ) = 3 ^ = 0.083 

Further, the number of outcomes favorable to event (Ai (p ^ 2 ) U (At P A,) 
is 9-21 -f 21-9 = 2-9-21, since when a seed of strain A is drawn first, there 
are 9 possibilities and with each of these there are 21 seeds of strain B possible 
on the second draw, and when a seed of strain B is drawn first, there are 21 
possibilities and with each of these there are 9 seeds of strain A possible on 
the second draw. Thus, 

P[{A,n A 2 ) U (A, P /I ,)] = ^^^0.434 

Example 4.12. Make a frequency table of the number of heads occurring 
in the sequences of Example 4.10. 

We could list the 32 possible sequences and count the number of sequences 
with 0, 1,2, 3, 4, and 5 heads. However, the work can be shortened some 
if we classify sequences according to number of heads, ignoring order, and 
then count the number of orders possible, that is, by counting the distinct 
orders of TITTP, HTTTT, HHTTT, HHHTT, HHHHT, HHHHH. Clearly, 
there is only one order for the first (or sixth) set of five symbols. For the 
second (or fifth) set of five symbols there are five orders, since H{or T) can 
appear in any one of five positions. For the third (or fourth) set of five 
symbols there are 


+ 5 + 5) = ,0 orders 

Since 1 + 1 + 5 -p 5 of the 32 sequences have already been identified, this 
leaves 20 sets of live symbols with either two heads or two tails, there being 
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10 of each due to symmetry (A direct method of compuim^ this frequency 
will be given m Sect 44) Thus, we have the following frequency table for 
the number of heads appearing on five fair tosses of a balanced com 


Number of Heads | 

0 

) 

2 

3 

4 5 

Frequency ( 

1 

5 

to 

10 

S I 


Example 4.13. Use the events defined in Examples 4 6 and 4 8 to find 

We find this probability by two methods First, identify the event A U 
(Cri-P) and use the'classical interpretation of probability The event A 
denotes a disk marked 2, 4, 6, 8, or 10. and rhe event C f) ^ denotes a disk 
marked 7 or 8 Thus, event U H fJ) denotes a disk marked 2. 4, 6, 7, 
8, or 10 Thus, 

PM'U<Cnf>)l = A=06 

since there are 10 disks and six are favorable to the occurrence of event 

/?u(cno) 

The second method is repeatedly to apply the formulas on p U2 so as 
to reduce P[Ji iJiCC) X>}} to an expression involving probabilities of single 
events and then use the classical interpretation of probability and conditional 
probability Thus, we have 

p[A u (cn x))i = PI A] + j>icn D] - Fli n (c n m 

* fUi + Eic n £>) ~ p\c n p[A 1 (c n o)i 

* I - P[A\ + F(C1 P[D 1 C] (I - P{A I {C n /))]) 

= 1 — A + A 'v 0 — i) 

4 4 FORMUIAS FOR COUNTING 

Examples 4 1 1 and 4 12 indicate that some systematic method of enumer- 
ating the possible ways events can occur would be desirable in order to 
determine the probability of simple or compound events We now present 
some methods which aid m the problem of counting 

4 41 Permutations 

Consider a set of n different objects, such as books on a shelf Let x ^ « 
of the objects be selected and arranged in a line from left to right, such as 
X books on an empty second shelf Any arrangement of any x objects is called 
a permutation of x objects selected from n 

A different permutation is obtained by interchanging any two objects of 
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a given permutation. Clearly, there are many permutations when x is large. 
We wish to count the number of different permutations of n distinct objects 
taken x at a time. For example, if the objects are denoted by a, b, c, and x — 
2, the six permutations ab, ba, ac, ca, be, cb represent all permutations of 
three objects taken two at a time. 

In the general case, think of x positions on a line as being fixed and 
determine the number of ways in which x objects can be selected from n 
and placed on the line. The position on the extreme left, that is, the first 
position, can be filled in n ways. Once this position is filled, there remains 
(« — 1) objects to place in the second position. Thus, for each choice in the 
first position there are (n — 1) choices for the second position. Hence, there 
are n(n — 1) choices for the first two positions. Repeating this argument, 
we find there are n — x + 1 objects left with which to fill the xth position. 
Hence, there is total of n(n — 1) ...(« — x + 1) ways in which to place n 
objects in x positions. Letting denote the number of permutations of 
n objects taken x at a time, we may write 


„P^ = n(n ~ 1) ... (n - X + 1) (4.5) 


For example, eight out of a class of 20 students may occupy the eight front 
seats in a classroom in joPa = 20-19 ... (20 - 8 + 1) = 20-19 ... 13 ways. 

Theorem 4.1. The number of ways of permuting x objects selected from n 
distinct objects is given by 


n 




n\ 

(« - ^)! 


(4.6) 


Proof. Since „Px = /i(n — 1) ... (n — x + 1), we may write 


,P. = M- - 1) . . . (-i - » + 

nl 

in-x)l 


2 - 1 ] 

2-1] 


When X — n, Eq. (4.5) reduce to 


nP„ = n{n - 1) ■ ■ • \ = n\ (4.7) 

If we let X = « in Eq. (4.6), 0! appears in the denominator. We define 0! 
to be 1, noting that this definition is consistent with the property that 
n\/n = (n — 1)! when « = 1. 

^ Sometimes we do not have n different objects. Instead, we may have Mi 
objects of one kind, n^ objects of a second kind, ...,«* objects of a A:th kind 
so that n, + + ...+/!;(.=: n. Let the number of permutations of these n 

objects be denoted by „P(n„ .... ;,,). Obviously, „P(n., ..., «,) is less than 
n n when some n, (f = 1, 2, ..., k) is different from one. For the moment 
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suppose the Mi like objects of the first kind arc made different by markings 
Thus they can be arranged mio diffcrcnl orders In a similar way the 
K, like objects of the second kind may be made difTerenl momentarily to 
give fij' limes as many permutations as before Continuing this procedure, 
the total number of permutations after alt like objects have momentarily 
been made different is «*' tiroes as large as the number of permuta 

tionSnffni n«) But thenumberof permutationsin this case is,/*,, that is, 

Hi’ "t' »/'(««. 

Thus It follows that 



or 

For example the number of permutations of 3 dimes 2 nickels and 1 penny 
IS ,1^3 2 l)sr«Vf3'2M*)sdO 

4 4 2 Combinofioni 

Next suppose that we have n distinct objects from which we wish to 
select X without regard to their arrangement Such an unotdered setechon 
IS called a ccmbmaiwn Thus if two students are selected from the front row 
which contains three say Bob Joe and Ted the selection of Bob and Joe 
IS the same combination as Joe and Bob The total number of combinations 
of two students is three in this case namely Joe and Bob Bob and Ted 
and Ted and Joe 

In order to determine the number of combinations in the general case 
Jet ^ J denote the total number of combinations possible in selecting x objects 
from n different objects The number of permutations of any given selection 
(combination) of r objects is x* Thus the lolal number of permutations of 
n different objects selected x at a time is 

*x/ {« — xj’ 

Hence it follows that 



Note that ~ “ Q) 
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Example 4,14. In how many ways can two students be selected from 
three? 

By Eq. (4.9) we obtain 

which is the same value obtained earlier by listing all combinations. 

Example 4.15. A balanced coin is fairly tossed five times. What is the 
probability of obtaining three heads and two tails? 

From Example 4.10 we found that the probability of obtaining any 
particular order of five simple events is The number of possible orderings 
of three heads and two tails is given by 



(see Example 4.12 for another method). But these ten orderings represent 
mutually exclusive ways in which the compound event of three heads and 
two tails can occur. Thus, the desired probability, if we use a generalization 
of Eq. (4.4a), is given by 

10 times 

P[3 heads and 2 tails] = + • • ■ + 

Example 4.16. Consider a dichotomous type of experiment in which a 
success (event £, say) occurs with probability p, and a failure (event E) occurs 
with probability 1 — p = q in a single trial. Find the probability of exactly 
X successes in n independent trials of the experiment. 

Consider a particular sequence of x consecutive successes followed by 
n — X failures. The probability of this particular sequence, if we use a genera- 
lization of Eq. (4.3a), since all trials are independent, is given by 

X times (h — x) times 

P-P p-q-q (4.10) 

The probability of obtaining x successes and n — x failures in any other 
sequence is the same as Eq. (4.10), since the p’s and q’s are simply rearranged 
to correspond to the new sequence of successes and failures. 

It is necessary that we count the number of different sequences in order 
to find the required probability. The number of sequences is simply the 

number of permutations of n objects, x of which are of one kind (p), and 

n - X of which are of a second kind (q). Using Eq. (4.8), we find this to be 

n\ 

x! {n - x)! 


(4.11) 
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Now these n^x'in — x)' sequences represent mutually exclusive compound 
events, each with a probability of Thus by a generalization of Eq 

(4 4a), the probability of one or the other of these sequences is the sum of 
n'fx'in — *}' probabilities p^q" *, that is, the required probability is the 
product of the quantities in Eqs (4 10) and (4 11) Since x may have any 
one of the values 0, 1, , n, we think of the probability of obtaining x 

successes m n indepeiident trials of an event which occurs with probability 
p in a single trial as a function of x given by 

where ir = 0, I, , « 

The function (4 12) is known as the btnomial density funclion or Bernoulli 
density function “Die name binomial comti from the fact that Eq (4 12) » 
a term in the expansion of the binomial {() — p) + p]' or (q + p)' We see 
that b{x, ij, p) IS a density function /( jt) » h(jr n, p) since 

= (i-f)- + »(i -rr + -pt ■?'+ +r 

“ 1(1 - p) + Pi' = 1 

The binomial distribution can be applied to many practical problems We 
consider one use of the binomial now, leaving others to be given later 

Example 4 17 It is known from long cxpenence that a certain manu- 
facturing process produces one per cent defective units What is the proba 
bility that there will be more than one defective unit m a random selection 
of 100 units’' 

We wish to find b{2) + b(3) + + h(!00) In this case, it ts easier to 

find 6(0, 100, 001) and ^l, 100. 001) first and then find the required proba 
bihty by subtraction, that is 

f I* ^ 2) = I - 6(0) - 6(1) 

= ‘ - STTOllOWO’’)™ - 7^(001)'(099)” 

= I - (1 99)(099)»* 

=^l~0?36=t0264 

since log (0 W)” « 99(~ 1 + 0 99564) = 9 56836 - 10 so that (0 99)» = 
0 370 and (I 99)(0 99)" *= 0 736 

From Example 4 17 it is clear that obtaining certain probabilities or 
cumulative probabilities for a binomial distribution involves lengthy calcala- 
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tions. To aid in solving problems in which the binomial distribution is used, 
tables have been presented in Ref. [12, 13, 15]. 

4.5. DISTRIBUTION, PROBABILITY, AND RANDOM VARIABLE 

In this chapter we have largely restricted our discussion of probability 
of events to the discrete case. This restriction was not necessary, since the 
relative-frequency interpretation of probability of events does not exclude 
the continuous case, and since we assume that the basic properties in this 
chapter are also valid for the continuous case. 

In Chap. 3 we discussed mathematical properties of probability density 
and distribution functions defined over all or part of the real numbers x. 
We have not discussed the problem of assigning values of x to the events 
in an experiment, since this is usually obvious in a particular experiment. 
The only requirement is that x be a real-valued function defined over all possi- 
ble occurrences (component or elementary events) in an experiment. This 
means that exactly one real number must be assigned to each occurrence 
and that each possible occurrence has such an assignment. Such a real-valued 
function, x, is also called a random variable (or variate or chance variable), 
indicating that its dependent variable is a special kind of variable. The 
random variable is discrete if it is capable of assuming only a finite or count- 
ably infinite number of distinct values and is continuous in an interval if it 
can assume any value lying in that interval. 

In this chapter we have discussed the probability of events of an experi- 
ment; in Chap. 3 we discussed the probability density and distribution 
functions of a real-valued function x, called a random variable, defined over 
all the elementary events of an experiment. Thus, once the values of the 
random variable x are assigned to all the elementary events of an experiment, 
we can relate the properties of probability discussed in this chapter to those 
in Chap. 3. Changing the statements about probabilities of events to state- 
ments about probabilities of random variables, we see how the properties 
of distributions given in Chap. 3 are rooted in the more practical approach 
to probability given in this chapter. 

By thinking of an actual experiment it may be made clearer to say that 
a random variable assumes values with associated probabilities. Before a 
particular experiment is run, the outcome is an uncertain value, and, so far 
as we know, it can be any value in a range. It is in this sense that the variable 
depends on chance, and it is in this sense that we shall often think of the 
variable. 

4.6. EXERCISES 

/ 

4.10. A well-balanced coin is tossed four times. Find the probability of obtain- 
ing exactly zero heads; exactly one head; exactly two heads. 
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4 11 Two fuses in a box of ten arc defective If three fuses arc selected at the 
same time v.hat is the probability that at feast one is defective^ 

4 12. (a) In how many ways may seven people be ordered on a bench? (b) 
In how many ways may seven people be ordered around a circular table? 

4 13 Compare the probability of rolling a 10 with two dice with that of rolling 
a 15 with three dice 

4 14 A box contains eight similar circular disks, each bearing exactly one of 

the symbols 1 2, 3 8 If two disks are drawn at the same time 

what IS the probability thalfa) 1 and 2 are selected? (b) t or 2 is selected’ 
(c) Neither I nor 2 is selected’ (d) Both I and 2 are not selected’ 

415 Three similar coins are tossed What is the probability of getting exactly 
two faces alike’ 

416 How many numbers can be formed by rearranging the digits m the 
number 4130131 when numbers beginning with 0 are excluded’ 

4 17 Box A contains three white and four black balls Box B contains two 
white and three black balls The boxes ate alike and all balls are the same 
sue (a) If two balls arc chosen from each box what is the probability 
(hat they will be the same color’ <b) If a box is selected (at random) and 
two balls drawn from it what is the probability that they will be the 
same color’ 

4 18 A box contains 12 similar circular disks each bearing exactly one of the 

symbols 2 3 4 13 so that the universal event is made up of the 

numbers 2 3 4 13 Let -4 denote the event that a number is a factor 

of 36 B (he event that a number is a multiple of 3 C the event that a 
number is a remainder on division by 9 D the event that a number is 
a positive odd integer and £ the event that a number is of the form 
4k +• 2 where k 0 1 2 (a) Which pair* of events are mutually ex 
elusive’ (b) Which pairs of events arc independent’ Explain (c) Which 
three events if any are independent’ Explain (d) An event is made 
up of the numbers 3 5 6 7 Express this as a compound event in terms 
of some of the simple events A B C D and £ (e) If possible express 
(he event made up of (he numbers II 12 13 in terms of some of the 
simple events A B C D and £ 

4 19 If n and x arc nonnegativc integers such that (n ~ 1) ^ x prove that 

O-C-D^C.') 

4 20 Let A B and C denote any three events (a) Prove that 

£[/C n B n C] ^ PU] P[B\A\ PICM n Bj (4 13) 

(b) Write P(A n B n C] as the product of three factors m five other 
ways (c) Write an expression similar to Eq (4 13) for n events 

4 21 Let .4 B and C denote any three events (a) Prove that 
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PM U 5 U C] = P[A] + pm + PIC] .4 J4) 

- P[A n P] - P{A n C] - P[B n C] + PM n p n C] 

(b) Write an expression similar to Eq. (4.14) for n events. What is the 
form of this expression when the n events are mutually exclusive? 

y 

4.22. A well-balanced die is tossed and the number n on the top face observed. 
Then n well-balanced dice are tossed together. Find the probability that 
a total of exactly nine points will show. 

4.23. Six cards of the same size have identical backs. Three cards have a red 
face, two cards a white face, and one card a blue face, (a) If the first two 
cards drawn have red faces, what is the probability that the third card 
is red? (b) If three consecutive cards are drawn, what is the probability 
that they all have different colors? (c) Compare the probability of draw- 
ing the sequence red, white, white with the probability of drawing the 
sequence white, red, red. (d) If three consecutive cards are drawn, what 
is the probability that the last (third) card is red? What is the probability 
that the last card is blue? 

4.24. The probabilities of joint (or compound) events are often easier to study 
with the aid of a table, particularly when the simple events are not mutu- 
ally exclusive. We consider the case of two simple events A and P. On 
a single trial of an experiment one and only one of the following events 
occur: A KJ B, A C\ B, A f\ B, A f\ B. Letting a, b, c, d denote the 
number of cases favorable to the occurrence of the mutually exclusive 
events AC\B, A(^B,AC\B, ACiB, respectively, we may present the 
information as shown in Table 4.4. We see, for example, that there are 
a + b cases favorable to the occurrence A, b + d cases favorable to 
the occurrence of “not P,” and n possible cases. 


Table 4.4 

Mutually Exclusive Joint Events in Terms of Two Simple Events 



B 

B 

Total 

/I 

a 

h \ 

a + h 

A 

c 

d 

c + d 

Total 

a + c 

b + d ^ 

a+h+c+d~n 


In a particular experiment there are 30 balls of the same size and weight. 
Suppose that 1 1 are painted blue and marked by x, 7 are painted red 
and marked by x, 9 are blue and not marked, and 3 are red and not 
marked. Make a table like Table 4.4 and use it to determine (a) the 
probability of a ball’s being red or marked, (b) the probability of a red 
ball being marked, (c) if color is independent of marking. 

4.25. The events and number of cases shown in Table 4.4 may also be presented 
in a Venn diagram, shown in Fig. 4.2, Use such a diagram to answer 
the questions in Exercises 4.17 and 4.24. 

4.26. The Venn diagram is particularly useful in problems with three events 
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Fig. 4.3 Verm Diagram 0fT*ble44 



A, B, C nhieh ire not muiu^ly exclusive Such a disgram ra made by 
drawing three intervecting circles such that eight regions, like those in 
Figure 4 3, are formed These regions represent the mutually exclusive 
joint events ,4fiBnC. AC\ Bf\C. A 0 B f\ C, A r\ B nC, 
A n Br\C, A r\ art C. A n BciC, A n S nC (It should be 
noted that the Venn diagram is restricted to cases with two or three 
simple evenu, that is, to two or three circles ) 



In Fig 4 3 we may think of events A, B, and C as referring to the 
sets of students enrolled in French, German, and Russian, respectively 
Thus, we may think of 30 students, majoring m a foreign language at 
a school, as being distributed as indicated m the diagram That is, three 
students arc enrolled In the three languages, four students are enrolled 
m French and German only, two students are enrolled m French and 
Russian only, five students arc enrolled m German only, etc Use Fig 
4 3 to answer the following <|uestions (a) How many of the 30 students 
are not enrolled m any of the three languages? How many are enrolled 
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in French? In German? In Russian? (b) How many are enrolled in both 
French and German? Both French and Russian? Both German and 
Russian? (c) If the proportion of students in the various categories is 
assumed to be constant from year to year, what is the probability that 
a student enrolled in Russian will also be enrolled in German? (d) If 
the proportion of students in the various categories is assumed to be 
constant, what is the probability that a student enrolled in both French 
and German will also be enrolled in Russian? 

4.27. liP[A n B] = P[A\-P[B\, prove that P[A C\ B] = P[A\-PS], P[A n B] 
= PiA]-P[B], and P[A n 5] = P\.AyP[Sl Note that A and B are 
independent when P[A n B] = P[A\-P[B'\. 

4.28. It is possible to have A and B independent, A and C independent, and 
B and C independent and still to have A n B and C, say, dependent or 
to have A, B, and C dependent. That is, it is possible for three events 
A, B, and C to be pairwise independent and not be mutually independent. 
For this reason we say events A, B, and C are mutually independent if 

P[AC\ Bl=P[A]-P[B\ 
n C]=PM-B[C] 

P[Bn C]=P[B].i>[C] ^ 

P[^ n B n C] = PM]-P[B]-P[C] 

hold, (a) Let A, B, C be events such that P[A 0 B D C] = P[A D S 
n C] = p[.l n B n C] = P[.4 n B n C] = 0 and p[^ n b n C] = 
PM n B n C] = P[^ n B n C] = P[.4 n B n C] = i. Show, using 
a Venn diagram, that the events are not independent, (b) Show that the 
events illustrated in Fig. 4.3 are not independent, (c) Write the set of 
11 probability statements, corresponding to Eq. (4.15), which are suf- 
ficient for the mutual independence of events A, B, C, D. Write the con- 
ditions which are sufficient for the mutual independence of events Aj, 
Ai, . , . , A„. (d) In applications it appears that practically all, if not all, 
events which are pairwise independent are also mutually independent. 
Thus, the distinction between pairwise and mutual independence is largely 
of theoretical interest. The reader might wish to try to give a practical 
application (interpretation) of (a) in Exercise 4.28 or to describe any 
other application. 

ffint. See the discussion in Ref. [4]. 

4.29. In the definition for mutual independence of three events given in Exer- 
cise 4.28, the system of equations (4.15) may be replaced by the system 
of eight (2’) equations of the form 

P[A' nB'n C'1 = P[A'}-P[B'].P[C'] (4.16) 

where A' denotes either A or A, B' either B or B, and C' either C or C. 
(a) Show that Eq. (4.16) follows from Eq. (4.15). (b) Use Eq. (4.16) to 
solve (a) and (b) of Exerscie 4.28. (c) Write the conditions corresponding 
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to Eq (4 16) »l..ch aro Joltocnt tor Iho moloal indtprndcnre ot t.mH 
A„A,. ,An 


■y = 2', n = 1. 2. 3* • positive integer 


of box B arc IhoroogWj raatd H-m five corns ore taken from tax 
B and placed in box at \Vbat is the ptobabilily that the dime is in box 
/I7 

4JI In a bridge game North end Sooth «ere dealt II tnimps What is the 
probability that East and West were each dealt one Irtimp 
4 33 For purposes of bidding in the game ot bridge, we commonly let an 
aee count font points a king three points, a queen two points, a )aclt 
one point and other cards jero points It the maximum point count tot 
partners is assumed to be 40 what is the probability of partners holding 
exactly 33 points between them** 

AM Definition 4 2 should usually be applied tn the compulation of simple 
conditional probabilities However, there are times very con 

troverstal (Refs (3 10 171) Bayes formula [see Eq (4 >8) J 
applied According to Bayes theorem if B. . B, ate mutually exclusi« 
and exhaustive events and tf A can only occur in eombinaUon wiin on 
of the n events B.. then 


!=• 1 . 


( 413 ) 


Prove Eq (4 18) 

4J5 Suppose three machines M, Af„ and Af, in a fartory make exactly the 
same parts and that they are packed in the same type of boxes Suppos 
machine Af. (i - I. 2. 3) makes n, boxes per day. of which pt per cm, 
on the average are declared defective If from all boxes produw Oj 
the three machines in a single day we select (at random) a box and rom 
this box we select (at random) a part which proves to be defective, w a 
is the probability that we have chosen a box made by Af,"’ 

4 36 Prove that 

PM, 0x4, U Uy4,l + PM,1 i’[^l 
when Ai , A^ are independent events 
4 37. We have generally illustrated the properties of probability in terms of 
number of favorable occurrences of discrete events However, there are 
problems m which probabihty is easiest to determine in terms of admissi 
blc regions m space, that is in terms of line segments tn one dimension, 
areas in two dimensions and volumes m three dimensions In sue 
situations any finite portion of the space, no matter how small contains 
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an infinite number of points, making the usual method of counting the 
number of favorable occurrences meaningless. In the geometrical problem 
it is usually assumed that the probability (or probability measure) of any 
subdivision of the admissible region is directly proportional to the size 
of the subdivision, even though it is possible to give a geometrical treat- 
ment to problems in cases where probability of a subdivision of a fixed 
size varies with location, ^r example, when the admissible region is 
a square of area A, the probability measure of any section (of the square) 
with area a is ajA. We now illustrate geometrical probabilities with the 
Buffon needle problem. 

Consider an infinite set of ruled parallel lines such that any two 
adjacent lines are d units apart. If a needle of length I (Kd) is randomly 
tossed on the set of lines, what is the probability that the needle will 
intersect one of the lines ? 

Let y denote the distance from the mid-point of the needle to the 
nearest line, and let x denote the angle the needle makes with the per- 
pendicular. The admissible values of x and y are given by —(jtll) < x 
< (tt/I) and 0 < y < (dll), respectively. By constructing a figure, as 
Fig. 4.4, it is clear that y < (//2) cos x when the needle intersects the nearest 
line; otherwise, y >(112) cos x. Thus, the curve defined by y = (1/2) 
cos X is the boundary between the region of intersection and noninter- 
section. In the rectangular co-ordinate system of x and y, as in Fig. 
4.5, the admissible region for the experiment is represented by a rectangle 




TT units long and d/l units high, and the admissible region for a single 
intersection of the needle and a line is represented by the area of the 
rectangle which is under the curve. Thus, the probability that the needle 
will intersect a line one time is given by 

rn/z I 

I -=- cos xdx _ , 

J-x/z 2 21 

d Ttd 

2 ^ 

For the case where / > d, show that the probability of the needle 
intersecting at least one line is 
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4J8. A triangular figure is tossed on a board ruled as m Buffon’s needle 
problem If the longest side of the triangle has length less than d, what 
is the probability that some part <? the trungular figure will cover any 
portion of a line! 
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SAMPLING 

AND SAMPLING DISTRIBUTIONS- 
BXPECTATION AND ESTIMATION 


The value of a given statistic can be expected to vary from sample to 
sample. This variation is studied in terms of sampling distributions (of 
means, proportions, totals, variances, etc.) determined from both finite and 
infinite distributions. Some desirable properties of statistics are discussed. 
The method of maximum likelihood for estimating parameters is described. 


5.1. INTRODUCTION 

A portion of a population of values, namely a sample, is normally used 
to study the population or the characteristics of the population. Since many 
possible samples can be drawn from a given population, many possible 
impressions or estimates of the nature of the population can be inferred. 
Thus, it is desirable that we select a measure to characterize the sample which 
corresponds to a parameter in the population and which shows relatively 
little variation from sample to sample. This leads us to a study of fluctuations 
of the measure or measures selected to characterize the sample and to con- 
siderations of desirable properties for this measure (or these measures) to 
possess. , 
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5 2 SAMPIE AND STAT/ST/C 

Let X(, Xft , jr, denote the set of numerical values of n observations 
selected from a larger set, the set being either a discrete one or a continuum 
We refer to the smaller set as a sampU of size n drawn from a univariate 
population A numerical value determined from some or all of the values 
which make up the sample js called a siatuilc One or more statistics may be 
used m describing or characterizing a sample of observations We now con 
sider some of the simple statistics used in drawing inferences concerning the 
larger number of similar observations which might have been obtained 

Since the primary function of a statistic is to lead to a clear understanding 
of the nature of the population, and since the population is usually studied 
by looking at the parameters, it is natural that we name and attempt to 
define statistics as we did parameters The notation for a parameter and the 
corresponding statistic usually differ, particularly if the statistic is one which 
IS often used Following the custom, we usually denote parameters by 
Greek letters and the corresponding statistics by the corresponding Latin 
letters Thus, the standard deviation for a population is denoted by and 
the standard deviation for a sample by s Similarly, n and m are used to 
denote the mean of a population and a sample, respectively Generally, we 
use X in place of m m this book, since this is common practice and since 
m IS used for other purposes 

The sample mean J of a sample of size n is given by 



The sample median is denoted by and is defined to be the middle numerical 
value if It IS odd and the average (arithmetic mean) of the two middle values 
if n 1$ even Corresponding to each parameter which measures central ten- 
dency in the population, we define a statistic which measures central 
tendency in the sample in exactly the same way The same notation is used 
to denote both parameter and statistic for all measures of central tendency 
except the mean and median, since there n little occasion to confuse them 
Three measures of dispersion in a sample are of interest at ibis time 
The range is defined and denoted as in Sect 2 3 2 The mean deviation about 
the sample median, dn, is defined by 

, (sa 

and the variance s* of a sample is defined by 

.1 - 2(JC>- 
a — 1 


( 53 ) 



SECT. S.3. 


SAMPLING AND SAMPLING DISTRIBUTIONS 


131 


It should be observed that dn, Xm, ■sS x, and (« — 1) replace Sm, l^m, o- , ii, 
and N, respectively. In case it seems more natural to use n as the divisor in 
defining s~, we point out that n — 1 is used in order that S' have one of the 
very desirable properties of statistics to be discussed in Sect. 5.6.2, namely, 
that the expected value (mean) of all statistics of a given kind computed from 
samples of size n be equal to the corresponding parameter. 

It should be observed that a sample is finite and that the values of a 
sample constitute a discrete set, no matter whether they have been drawn 
from a discrete or a continuous distribution. When our only interest is in 
describing the sample, it makes no difference whether the values have been 
drawn from a discrete or continuous distribution. However, when our inter- 
est is in making a statement about the population from which the sample 
was drawn, we must distinguish between the two cases. Thus, we consider 
separately sampling from a finite and discrete population and sampling 
from an infinite and discrete or continuous population. 

5.3. SAMPLING DISTRIBUTIONS DERIVED FROM A FINITE POPULATION 

We may examine variation in statistics of one kind (say, means of samples 
of size n drawn by the same method from one population) form either the 
experimental or the mathematical point of view. Both approaches are in 
common use, and something is to be gained by studying each. We illustrate 
both methods in a simple case in order to compare them. Afterwards, we 
devote most of our discussion to mathematical sampling. 

There are marry methods by which we may draw a sample of n objects 
from a population. In this section we consider the two most useful methods. 

Let the observations in the finite population be denoted by x',, x:,, ..., Xjv 
Some of these values may be the same numerically. We say that an observa- 
tion is randomly selected or is a random observation if each of the N obser- 
vations in the population has an equal chance of being (is equally likely to 
be) selected. 

Let S denote a sample of n observations with x[ denoting the first obser- 
vation selected, .vl the second observation selected, ..., .vI, the wth observation 
selected. Assume that each observation x', (/ = 1, ..., n) is randomly selected. 
We say that S is a random sample with replacement or a simple random 
sample if a'I is returned to the population before xi is selected, xl is returned 
to the population before xj is selected, ..., x;._, is returned to the population 
before x'„ is selected. On the other hand, if, for each / (/ = 1, 2, . . . , « — 1), 
x; is not replaced before x.^j is drawn, we say that S is a random sample 
without replacement. 

It should be noted that the probability of selecting each member in a 
simple random sample remains the same, but the probability of selecting xj 
in a random sample without replacement changes according to how many 
observations have already been drawn. That is, in obtaining a sample of 
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Size n, the popuJation remains the same for each selection in the first case, 
but the population changes for each selection m the second case 

5 3 J Sampling Wilhouf Rppfocemwrf 

First, consider sampling v>ithout replacement Let five disks of the same 
kind marked 1 , 3, 5, 7, and 9, respectively, be placed in a bowl The distribu- 
tion of this population is shown in Table 5 1. After the disks have been 
thoroughly mixed, two are drawn one after another (or simultaneously), the 
numbers recorded, and the disks then returned to the bowl Then the disks 
are thoroughly mixed again, two are drawn, and the numbers arc recorded 
This process can be repeated as many times as we wish This is an illustra- 
tion of expenmenlal sampling Hithoul replacement from a finite popuialm of 
five different numbers 


TabU 5t 

Discrete Uflitonn Population of Disks 
Nymbtt on jDUfc Frtovtncy 


5 

7 

9 


Note that drawing one disk after another for two draws may be thought 
of as leading to exactly the same sample as drawing two simultaneously, 
provided the disks are not mixed between the first and second draws How- 
ever, if the disks are mixed between draws, we might expect that on a given 
trial the two methods would lead to different samples, but it is assumed that 
m the “long run" the two methods of drawing two disks lead to approxi- 
mately the same collection of samples Verification of this assumption is 
left to the student 

Example 5.1. Draw 100 experimental random samples of size two without 
replacement from the population m Table 5 1 and compute the mean ^ of 
each sample Construct a table showing the distribution of S 

One set of 100 random samples without replacement is shown m Table 
5 2 The frequency and relative frequency distributions of S are shown in 
Table 5 3 and represent an approximation to the sampling distribution of S 
We call this an experimental or empirical sarnoliup distribution 

If another 100 samples were selected by the same method and a relative 
frequency table of their .f’s constructed, we would expect to get a slightly 
different approximation to the sampimg distribution of X If two such distri- 
butions are noticeably very diifeient, we could obtain a more stable expert- 
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Table 5.2 

Random Samples Without Replacement Drawn 
from the Population in Table 5.1 


No. 

Sample 

No. 

Sample 



No. 

Sample 

1 

1. 3 

26 

9. 1 

51 

3, 7 

76 

7, 9 

2 

3, 5 

27 

7, 3 

52 

3, 1 

77 

9 3 

3 

9, 5 

28 

7. 1 

53 

5, 9 

78 

5, 9 

4 

3, 1 

29 

9. 1 

54 

7, 3 

79 

5. 1 

5 

7, 5 

30 

7. 5 

55 

7, 5 

80 

9, 3 

6 

9, 3 

31 

5. 1 

56 

5, 9 

81 

7, 1 

7 

7, 9 

1 

9. 5 

I 57 

7, 3 

82 

1, 5 

8 

5, 9 

33 

7. 1 

58 

9. 5 

83 

1, 3 

9 

7, 3 

34 

1. 3 

59 

7. 3 

84 

5, 7 

10 

1, 5 

35 

7. 9 

60 

5, 1 

85 

7. 5 

11 

9.7 

36 

9. 1 

61 

5. 1 

86 

1. 9 

12 

3, 1 

37 

7. 5 

62 

7. 5 

87 

9, 5 

13 

3, 1 

38 

7, 5 

63 

9, 1 

88 

9. 1 

14 

3, 7 

39 

7. 3 

64 

3, 5 

89 

3, 7 

15 

5.7 

40 

7, 3 

65 

7, 9 

90 

1, 7 

16 

9.7 

41 

9. 3 

66 

I, 7 

91 

1, 5 

17 

3. 1 

42 

7. 3 

67 

5,7 

92 

1, 7 

18 

5. 9 

43 

9. 5 

68 

7, 1 

93 

5. 3 

19 

7. 9 

44 

1,9 

69 

7, 1 

94 

5, 3 

20 

9. 5 

45 

7, 5 

70 

9, 3 

95 

1.7 

21 

9. 7 

46 

3, 5 

71 

5, 9 

96 

3, 1 

22 

7. 5 

47 

5, 1 

72 

9, 5 

97 

9, 1 

23 

9. 1 

48 

3, 7 

73 

7, 3 

98 

9, 1 

24 

9. 5 

49 

9, 3 

74 

9, 7 

99 

7, 3 

25 

9. 3 

50 

7, 5 

75 

7, 9 

100 

7, 9 


Table 5.3 

Experimental Sampling Distribution of x Obtained from Table 5.2 


.X 

Frequency 

Re/. Fre?. 

2 

9 

0.09 

3 

8 

0.08 

4 

13 

0.13 

5 

24 

0.24 

6 

20 

0.20 

7 

14 

0.14 

8 

12 

0.12 


mental sampling distribution by increasing the number of samples to, say, 
500; that is, we would expect less variation from one experimental sampling 
distribution of 500 .x’s to another than we would for experimental samph 
ing distributions determined from 100 samples. In any case, it seems reason" 
ably clear that we should be able to choose enough samples so as to be 
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satisfied that the resulting experimental sampling distribution gives a good 
idea (approximation) of how means of samples of size two change 

The theoretical sampling distribution or. for short, sampling distnbuim 
obtained by mathematical methods ts the model for an experimental sampling 
dislrtbution The mathematical method is illustrated m Example 52 After 
discussing the theoretical sampling distnbufion. we show how this distnbu 
tion can be used to make predictions about what actually happens m expen 
mental sampling from a finite population 

Example 5 2. Find the sampling distribution of means of samples of size 
two drawn without replacement from the finite population (disks marked 
1, 3, 5, 7, and 9 respectively) given in Table 5 1 and Example 5 I 

Alt possible samples which can be drawn from this population without 
replacement are 

1.3 l.S 1.7 t.9 3.5 
3,7 3,9 5.7 5,9 7.9 

These ten samples are equally likely events The means of these ten samples 
are distributed as shown in Table $ 4 and this is the sampling dismbutm 
of means required 


Table S4 

Theoretrcsl Sampfing OistntHition Without Replietment 
of t Obiaioed from Table 5 1 


X 

Frr^tney 

Ret Firp 

2 

1 

010 

3 

1 

010 

4 

2 

0 20 

5 

2 

020 

6 

2 

020 

7 

J 

010 

8 

1 

010 


The distribution of Table 54 is clearly a probability distribution, that 
is. It gives probabilities The third column gives the probability of the cor 
responding mean in the first column Further, the probability of obtaining 
any subset of means is found by adding the appropriate probabilities in the 
third column 

Example What is the probability of obtaining a mean of 3, 5 or 7 
(that IS, a value which is found in the original population) if a sample of 
size two IS drawn without replacement from the population cS Table 5 I** 
It follows that = 3 or 5 or 7J = 0 10 + 0 20 + 0 10 - 0 40 
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The distribution of Table 5.4 also serves as a limiting distribution (or 
model) of an experimental sampling distribution one would get by drawing 
more and more samples of size two. Comparing Tables 5.3 and 5.4, we see 
that some relative frequencies in Table 5.3 are near (in value to) those in 
Table 5.4, whereas others are not so near. However, if we found an experi- 
mental sampling distribution of means by drawing 500 samples, we could be 
“almost certam” that this distribution of means would be closer to the 
theoretical sampling distribution than an experimental sampling distri- 
bution of 100 means. As the number of samples of size two gets larger 
and larger, we can be “almost sure” that the corresponding experimental 
sampling distribution gets nearer and nearer the sampling distribution. 
Thus, it is in this sense that the theoretical sampling distribution is a limiting 
distribution for the experimental sampling distribution. Hence, by discussing 
the theoretical sampling distribution alone, we can get a clear picture of the 
nature of the variability in means computed from random samples drawn 
without replacement. ' 

We used sample means to compare experimental and theoretical sampling 
distributions, even though the comparison could have been made with the 
use of any other statistic. Since the mean is a fairly simple statistic, and since 
it plays such a central role in applications, we shall continue to use it in 
much of our discussion on sampling. 

By comparing the third columns of Tables 5.3 and 5.4, we have already 
noted differences in the relative frequencies of the experimental sampling 
distribution of 100 means and the theoretical sampling distribution. In 
Example 5.4 we examine differences in their means and variances. 

Example 5.4. Compute the mean and variance for the experimental 
sampling distribution of Table 5.3; for the sampling distribution of Table 5.4. 
Compare these means and variances. 

Denoting the mean and variance of the experimental sampling distribu- 
tion by X and si, we find that 


and 


X = 


2/^ ^ 528 _ 
100 . 100 


rix - 


2/Jc= 


100 


3090 - 


(528)= 

100 


99 


99 


= 3.05 


Note that only a sample of values (means) is used to compute the mean 
and variance. Thus the notation and formulas for samples are '•equired. 
However, for the sampling distribution shown in Table 5.4 all the values 
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(means) are known, and thus the mean and vanance are parametere of a 
population and require the notation and e*t Wc find that 


a*, = 




Both the mean and vanance for the expenmental sampling distribution 
of too means are larger than they are for the theoretical sampling distribu- 
tion Such would not always be (me Actually, wc expect J to be smaller 
than about as often as it is larger, and we expect the same thing to be 
true of the variances Furthermore, we expect x to get closer to (it “almost 
always" and jJ to get closer to <rl “almost always" as the expenmental 
sampling distribution gets larger 

So far. we have compared two sampling distnbutions, empirical and 
theoretical, with the sample sire Axed at two It » more important to study 
the relationship between the population and the theoretical sampling distn- 
button of some statistic, while allowing the sample size n to vary In terms 
of parameters, we wish to find the relation between fi, and (i (or n,) and 9t 
and 9* (ot where n and <r’ denote the mean and variance of the popula 
tion from which the sampling distribution of X was determined 

Example 5 5 Compute (i and «r‘ for the population given in Table 5 1 
Use (i, and found in Example 5 4 to compare (i and (it, o-' and 
Using Table 5 1, wc find that ^ = 5 00 and 

165 

= 8 00 


Thus, for a particular population, a particular statistic, and a particular 
sample size, we see that the means, /t and (it, of the two distributions are the 
same and that the vanance <ri of the sampling distribution is considerably 
smaller than the variance <r* = 5 00 of the population As a matter of fact, 
cr’ and cri satisfy the equation 


N~n 


where N denotes the population size and n the sample size, for 
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Note. Since the sampling distribution of means x is also a population of 
x's, the original population from which the samples are drawn is sometimes 
called the parent population of x’s. 

The relationships pointed out in Example 5.5 for a particular case are 
true in the general case. The general statement is found in Theorem 5.1. 

The standard deviation of the sampling distribution of means is some- 
times called the standard error of the mean. In general, the standard 
deviation of the sampling distribution of any statistic t is called the 
standard error of that statistic and denoted by a-i. 

Theorem 5.1. If N denotes the size of any finite population and n the size 
of a sample selected without replacement, then for all possible samples of size 
n the mean of the means is equal to the population mean p; the variance 
of means <t\ is equal to the population variance times a factor 
{N - n)l{n{N - \)\\ that is 


and 




(5.4) 



N -n 
N~ 1 


(5.5) 


Proof. If the population values are denoted by 


Xi, X2, ...» Xfr 


then all possible samples of size n may be indicated by (x,, . . . , Xn_,, x„), 

(X], Xj, , . , . , Xn_i, Xn+i), . . . , (xjvr-n+ij ^jv-n+ 2 j • • • > There are 

such samples, each having the same probability of pi = 1 1(^\ where / = 

IN\ . / \ ” / 

2, . . . , ( j. Now px is the mean of the mean of all these samples; that is 


or 


— ^iPl + XtPt + • • • + X^;v^ 


pi — “h '^5 “b ' ' ' "b Xn _|_ X] -j- X; • • • + Xn-i “h Xn^i 

L n n 

-f- • ■ . + "h '^J^-TH-2 d- • * • Xjy-l -j- ^ 

We must now count the number of times each X; occurs. For a particular 
Xi which occurs in a sample of size n, there are (n - I) other x’s which may 

be selected from N ~ 1 values; that is, there are ~ ways of selecting 

the other (n - I) x’s. Since each x, occurs the same number of times, we 
may write 




-ii! 


-IF 




Two types of terms are mvolvcd in squaring these sums namely, Jtj snd 
2xix, (i ^j) We know that each jc? in the square brackets occurs _ }) 
times by using an argument similar to that used m obtaining Eq (5 4) By 
the same kind of argument we find that a particular product of XtX, in the 
square brackets occun ~ | j times Since this is true for every pair x, sad 
xj 0 we may write 






( 56 ) 

4~ X/r t 
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1 1 

1 1 1 N-n 

Tr-~ nN N- nN^ 

(5.7) 




and 



) 2 ^ 

r n-\ 

1 

O 

- 

\_nN(N - 1) 

N\ 


-2(N-n) 

nN^{N- 1) 


(5.8) 


we may, after substituting Eqs. (5.7) and (5.8) in Eq. (5.6), write 

= (^) • W + • • • + + • • • + 


or 


But 


N-n 
n{N - 1) 


T — {xj + • • • + x]ir) — (XjXj + • • • 4- Xa-_iXjv) 


M- 


!f /A' \ S 

N'^xl- {'^xA 
1=1 \ 1-1 / 


X? -1- • • • + x^- 


(5.9) 




N 


f _ ^ X| + • • • + Xjy y 


or 


~ 7^) * ~ ^(^1-^= + • • ■ + ^A'-l-^A') (5.10) 


Hence, on substituting Eq. (5.10) in Eq. (5.9), we get 

» _ N — n 
" n{N- 1)*^' 

It should be noted that nothing was said about the form of the parent 
population, and that o-> < o-- when the sample size is greater than one. 
Further, note that when N is large when compared to n, the factor 
{N — ri)l{N — 1) is nearly one, but always less than one. Thus 



(5.11) 


is a good approximation when n is small when compared to N\ that is, the 
approximation in Eq. (5.11) is good when {N — n)l{N — 1) is near one. 


5.3.2. Sampling with Replacement 

We have considered in some detail sampling from a finite population 
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Without replacement Next we consider the problem of sampling from a 
finite population with replacement This method is not as useful m a practical 
sense as the one already discussed However, when it comes to sampling 
from an infinite population, sampling with replacement is the method most 
commonly used— at least in the theoretical sampling distribution We intro- 
duce this method of sampling now, using a finite population in order to make 
clear the technique and to have a connecting link between sampling without 
replacement from a finite population and sampling with replacement from 
an infinite population Example 5 6 illustrates experimental sampling with 
replacement 

Example 56. Draw 100 random samples of size two with replacemeot 
from the population in Table S 1 and compute the mean of each sample 
Construct a table showing the experimental sampling distribution of 100 ^ s 
Place five disks marked I, 3. S, 7, and 9. respectively, in a bowl and 
mix thoroughly Draw one disk, record its marking, replace it in the bowl, 
and mix again Draw a second disk, record its marking, and replace it m 
the bowl This gives a random sample ofsue two with replacement Repeating 
this process 99 times, we obtain the 100 samples shown in Table 5 5, and 
after computing the 100 means we construct the sampling distribution of 
means shown in Table 5 6 If another 100 such sample means were determined 

Ttble S.S 

Random Samples Wiih Replacement Drawn 
from the Populaiion m Table $ 1 
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Table 5.6 

Experimental Sampling Distribution of x 
Obtained from Table 5.5 


X 

Frequency 

Rel. Freq. 

1 

4 

0.04 

2 

8 

0.08 

3 

13 

0.13 

4 

18 

0.18 

5 

25 

0.25 

6 

10 

0.10 

7 

15 

0.15 

8 

6 

0.06 

9 

1 

0.01 


and the relative-frequency table constructed, we would expect to get a 
different experimental sampling distribution of x. However, just as in the 
illustration of sampling without replacement we can get a very good idea 
of the theoretical sampling distribution (model) by taking n sufficiently large. 

The method of obtaining a theoretical sampling distribution with replace- 
ment from a finite population is illustrated in Example 5.7. 

Example 5.7. Find the sampling distribution of means of samples of 
size two drawn with replacement from the finite population (disks marked 
1, 3, 5, 7, and 9, respectively) given in Table 5.1 and Example 5.1. 

All possible samples which can be drawn from this population with 
replacement are 


1,1 

1,3 

1,5 

1,7 

1,9 

3,1 

3,3 

3,5 

3,7 

3,9 

5, 1 

5,3 

5,5 

5,7 

5,9 

7,1 

7,3 

7.5 

7,7 

7,9 

9,1 

9,3 

9,5 

9,7 

9,9 


It should be noted that the same value can appear on each draw. For example, 
1. 1 and 7, 7 are possible samples. Further, 1, 3 and 3, 1 represent different 
samples (ordered samples), even though they contain the same values. In 
the first sample, 1 was drawn first and then 3, but in the second sample the 
reverse is true. These 25 samples are mutually exclusive and equally likely 
and thus have the same probability. Computing the mean of each sample, 
we obtain the sampling distribution of means shown in Table 5.7. This 
distribution serves as a model for the distribution shown in Table 5.6. 

Example 5.8. Compare the means and variances of the distributions found 
Jn Tables 5.6 and 5.7. 

The mean and variance of the experimental sampling distribution of 100 
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TMe 51 

Theoreltcal Samplinj D»tn!«H*on with ReplAcement 
of i Obtained from Table S 1 


s 

Frequency 

Pel Freq 



oot 

2 

2 

008 

3 


012 

4 


016 

5 


020 

6 


016 

7 


012 

8 


OOS 

9 


004 


means shovm in Table 5 6 are 


and 


5 = ^ * 4 78 



The mean and variance of the sampling distribution of Table 5 7 are 
and 



Even though the mean x and variance s, of the experimental sampling dis- 
tribution are not the same as the mean and variance o-J of the theoreti- 
cal sampling distribution, it can be shown that x and li get closer and closer 
to fi, and <ri “almost always" as the number of samples drawn goes up 
It should be observed that the vanance obtained when one is sampling 
with replacement is larger than the vanance obtained when sampling without 
replacement, that is, 4 00 > 3 00 Thus, we would not expect Eq (5 5) to 
give the relation between and when sampling with replacement Theo- 
rem 5 2 relates and to fi and <r* m this case 

Theorem 5.2. If N denotes the size of an) finite population and n the size 
of a sample selected Kir/i replacement, then for all possible samples of size n 
the mean of the means p., is equal to the population mean and the variance 
of means a’, is equal to the population variance <r times a factor l//i. that is 
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(5.12) 

( 5 . 13 ) 


Proof. Let the population values be denoted by 

Xj, Xz, • . . , Xjy 

Then all samples of size n may be indicated by (x,, x„ . . . , x,, x,), (x„ x,, 

. . . , X], Xj), . . . , (Xi, Xi, • • . , X], xf), • . . > (Xfi, Xjf, . . • , Xjf, Xi), . . . , (Xjy-, Xa", 

. . . , Xjf, xf). There are N” such samples, for the first observation may be 
any one of N values, the second observation any one of N values, . . . , the 
nth observation any one of N values. Now each sample has the same 
probability p* = 1///" {k = , N”). Hence, the mean of the mean of all 

these samples is given by 


fix — ■^iPl + + • • • + 


or 


Px = 


X, + • • • + X, ^ Xat + • • • + Xa- 

ft n 

N" 


(5.14) 


In order to simplify this expression further, we count the number of times 
each X occurs in Eq. (5.14). There are N'‘ samples! each with n values. 
Therefore, Eq. (5.14) contains n^N" x-terms in all. Since there are N values 
in the population, and since each x obviously occurs the same number of 
times, then any particular x appears n’N"lN = ffN"'' times. Thus, 


„ _ + • • • + Xa-) 




_ X, + • • • + Xjf _ ,, 

N ^ 


The derivation of Eq. (5.13) follows a similar pattern, and will be left 
as an exercise for the student. Using the variances computed in Examples 
5.5 and 5.8, we see that 


cr- 

n 


8.00 

2 


4.00 = a-i 


which verifies Eq. (5.13) in a special case. 

Example 5,9. The probability that a sample mean falls between two 
limits depends on the method of sampling, the size of the sample, and the 
population from which the sample is drawn. For the parent populatiori-given 
m Table 5.1 and a sample of size two determine the probability that a |ample 
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mean X falls between 3 and 7 inclusive when the sample is randomly drawn 
(a) without replacement, (b) with replacement 
Using Table 5 4 we find for (a) (hat 

/’[3^:e^81 = /’(;? = 31+ + i‘[i = 7] = 0 10 + +010 = 080 

For (b) using Table 5 7 we obtain /*(3 ^ ^ 7] = 0 76 Clearly, we expect 
the probability to be smaller for the sampling distribution with replacement, 
since the means are more disperse 

Example 5 10 What is the probability that a random sample of sire two 
drawn from the population of Table 5 ) has a mean falling between (i, - 2/, 
and n, + 2<rj when the sample is drawn (a) without replacement (b) with 
replacement? 

According to Example S 4 the mean iig and variance ir' for the sampling 
distribution without replacement given in Table 54 are 500 and 300 
respectively Thus ctj = I 73 and the limits for (a) are 5 00 ± 2(1 73) or 
1 S4 and 8 46 Using Table S 4 we find the probability to be 
/’(I S4 < X < 8 46] = I 00 The mean and variance of the sampling distn 
buiion with replacement arc 5 00 and 4 00 respectively Thu$(r* = 200 
and the required probability when Table 5 7 is used is ?[500 - 2(2 X) 
<i < 500 + 2(200)1 = /»(! <x <91 = 092 

Example 5 11 What size sample must be drawn from a population of 
size N in order for the standard deviation of the sampling distribution of 
means to be half the standard deviation of the parent population^ 

Let c mr. Then <r, a <rj2 The sample size depends on the method 
of sampling When sampling with replacement we have irj = »*/" Thus 



When sampling without replacement we have 



{N - l)ii = A{N - n) 


Thus m our problem 
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It is very interesting to note that the sample size does not depend on 
the population size when one is sampling with replacement, but it does 
depend on the population size when one is sampling without replacement. 
For when = 9, in Example 5.11, the sample size must be four when we 
are sampling with replacement and three when we are sampling without 
replacement; when N = 3, the sample sizes are four and two, respectively. 

Further, note that the variance of the sampling distribution of means 
can be made as small as we wish simply by choosing n large enough. 

So far, the sampling distributions of means determined from samples 
drawn from finite populations are the only particular sampling distributions 
discussed. Sampling distributions of other statistics are considered in the 
next set of exercises, and those determined from infinite populations are 
discussed in Sect. 5.4. Sampling distributions derived from particular popula- 
tions are discussed in later sections. 

5.3.3. Exercises 

5.1. The observations of a sample of size seven are arranged in increasing order 
of magnitude and denoted by Xt,X}, . . . , Xf, respectively, (a) Which, if 
any, of the following are statistics? 

(i) (x, + x,)l2 (vi) (2 jci + ;icj + Xs + 2xj)l6 

(ii) (atj -4- • • • -f Xi)l5 (vii) Xt 

(iii) Xb - Xs (viii) (I/x, + . . • + l/x^)/7 

(iv) [(ATe + Xr) - (x, + Xt)]/2 (ix) (xf -f xD/2 

(v) 7(x, + Xr)/2 (x) (Xi + Xf -h Xb)/3 

(b) ‘Which, if any, of the values in (a) may be used as measures of central 
tendency? As measures of dispersion? (c) Give a generalized expression 
for each of the values in (a). 

Hint. Let Xi,X2, ■ . .,x„ denote an ordered sample and consider cases 
where n is odd and even. 

5.2. The five disks of a finite population are marked 1, 4, 7, 10, 13. List all 
samples of size three which can be drawn from this population without 
replacement and use in the calculations of (a), (b), (c), (d), (e), (f), and 
(g): (a) Find the sampling distribution of means, (b) Find the sampl- 
ing distributions of totals, (c) Find the sampling distribution of medians, 
(d) Find the sampling distribution of ranges, (e) Find the sampling 
distribution of variances. (1 ) Find the sampling distribution of standard 
deviations, (g) Find the sampling distribution of (x, + x,)l2, where Xi 
and Xi denote the smallest and largest values of a sample, (h) Find the 
means of the sampling distributions in (a), (b), (c), (d), (e), (f), and (g). 
(i) Compute the mean, median, range, variance, and standard deviation 
of the parent population and compare each with the appropriate values 
in (h). (j) Find the variances of the sampling distributions in (a), (b), and 
(g). Find the variance of the parent population and compare with each 
of the variances in (a), (b), and (g). 
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5 J The four disks of a finite population are marked 1 , 4, 7, and 10 List all 
samples of size three which can be drawn from this population with 
replacement and use to answer (a), (b). (c), (d) (e), (f). (g). (h), (i) and (j) 
of Exercise S 2 

5 4 The SIX disks of a finite population are marked 1, 1, t. 3, 3, and S List 
all samples of size two which can be drawn from this population without 
replacement and use to answer (a), (b>, (c), (d), (e), (0. (g) (h), (i), and 
(j) of Exercise 5 2 

5 5 Work Exercise 5 4 if the samples arc drawn with replacement 
5 6 Prose Eq (3 13) of Theorem 5 2 

5 7 Use Exercise 3 2 to find the probability that a sample drawn at random 
from the parent population has the following statistic within two standard 
deviations (in units of the statistic) of the mean of the sampling distn 
button (a) mean (b) median, (c) (jr, + Xj)'2 
58 Use Exercise S 3 to arswer (a) tb). and (c) of Exercise 5 7 
5 9 Use Exercise 5 4 to answer (a) (b) and (c) of Exercise 5 7 
5 JO What size sample must be drawn wiihout replacement fromapopulaiion 
of size tJ in order for the standard deviation of the sampling distribution 
of means to be l/Aih the sianda'd deviation of the parent population'* 
5 !l Assume (hat three finite populations of sizes 1000, 10,000 and 100000 
respectively have the same variance A random sampleofsize SOisdrawn 
without replacement from each of these populations Compute ct fot 
each and compare their magnitudes 

5 12 The discrete uniform distribution has the density function /(a) «» 1/" 
where x ~ \ 2 m Samples of size n are drawn with replacement 
Determine the sampling distribution of means 

5 13 Work Exercise 5 12 if samples of size n are drawn without replacement 
5 14 Dctcrmme the sampling distribution of ranges of samples of size n drawn 

with replacement from the popubtion of Exercise 5 12 

5 15 Determine the sampling distnbutron of medians of samples of size n 
drawn without replacement from the population of Exercise 3 12 

5 16 (a) Find the mean and variance of the sampling distribution of means 
m Exercise 5 12 (b) What proportion of sample means lie within one 
standard deviation of the population mean** 

5 17 Use Exercise 5 14 to determine what proportion of ranges of samples of 
size /t are greater than m/Z less than m/4 
518 A discrete distribution has density function /(x) = (* -2)V31 

•*=1,2 6 Samples of size n are drawn With replacement (a) Deter 

mine the sampling distribuuon of totals (b) Determine the sampling 
distribution of medians (c) Find the mean and variance of the sampling 
distribution m (a) m (b) (d) Find an interval which contains 90 per cent 
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of the means of samples of size n dra:vn froiTi the parent population. What 
are the limits of the shortest such interval? 

5,19. If samples of size n are drawn with replacement from the dichotomous 
distribution whose density function is giv^n in Eq. (3.11) and the total 
number of successes is denoted by T, th^n the density function of the 
sampling distribution of 7? according to Formula (4.12), is 

f(T) = — (5.15) 

T\in-Ty/^ 

where 

r = o, 

Thus, it is clear that the binomial distribution may be considered a sampl- 
ing distribution of totals obtained from n drawings from a dichotomous 
SbiGw LKe, awd variance <xV of the hinomM 

distribution are given by 

fjbj. = np and tr'^ — npg — — p) (5.16) 

5.20. The moment generating function (MGF), A^/(/). of the variable x with 
discrete density function f(,x) is defined by 


A/t(0 = 2 


(5.17) 


where the summation is over all values of x for which jix) ^ 0. [See 
Formula (3.43) of Exercise 3.47 for further information.] Use the moment 
generating function for the binomial distribution in Exercise 5.19 to 
find /ij. and a-%. 


5.21. Let samples of size n be drawn without replacement from a finite dichoto- 
mous population of size N in which there are Np “successes” and Nq 
“failures” (p 4- 9 = 1). Then prove that the relative frequency of x 
successes and n — x failures in a sample 0 / size n (i. e., the hypergeometric 
density function of a) is given by 


f(x) = h(x,n - x;N,p,q) - 



(5.18) 


Note. The distribution derives its name from the fact that the values 
fix) can be expressed as successive terms of a hypergeometric series. 

5.22. Show that for the hypergeometric distribution 

= np and o-j -= /;p(l - p) • (5.19) 

The mean is the same as for the binomial distribution, but the variance 
is less. It can be shown that when Np and N approach infinity in such a 
manner that the ratio p — NplN becoines fixed, the hypergeometric 
distribution reduces to the binomial distribution. Further, if nlN c 0 l 
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p < 0 1, and N is fairly Uiee, the Poisson distribution may be used as 
a satisfactory approximation m most cases 
5^ A dichotomous distribution of sue eight contains three defective and 
five nondefective parts (a)UseE<i (5 l8)to find the frequency distribuiien 
of the number of defective parts in a random sample (without replacement) 
of size five fb) Compute the mean and variance for this dutributicn ind 
show that the results obtained agree with those obtained by using Eq 
<5 19) In the past the bypergeometric distribution has most often been 
used in connection with dcceprnnce sampling, but this is by no means (he 
only area ofappiicalion 

5^ There are many experiment* in which we require more than two cats 
gories of classification For example, in response to a question one might 
obtain answers such as "yes,” “no." ‘ don't know," and “no answer,” 
in tossing a die there are six categories Thus, we consider generalizations 
of the dichotomous type distributions of Exercises 5 19, S 2CI, 5,21, 5 22 
and 5 23 

Suppose that there ate k mutually exclusive and exhaustive possible 
outcomes Oi 0,. ,Ot of an experiment for which the probabilities art 
Pi. Pi. . Pt. respeetively. and 

- * 

Let samples of me n be drawn with replacement from such a population. 
Let Xi denote the number of times outcome O, occur* {/ *» 1, t*) 

Prove that the joint density function /(*,, , *») for the random vari- 

ables x„ , X,, I* 

/(X,. = 

where each x, may range from zero to » inclusive and 

i*.=" 

Since the terms in the expansion of 

iPt + Pt + + P*)" = I 

ate those given by Eq (5 20X we call this distribution the muUinoma^ 
distribution 

It should be noted that Eq (5 20) ts a joint density funcUon of onb 
k - 1 random variables since the <Hh x is exactly determined bv tb« 
relation 

when the other k — ] xs an specified Since the binomial density w** 
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written as f{x) rather than f{x, n - x), we may write the multinomial 
density as /(xi, . ■ ■ , x/c-i) in place of /(x„ . . . , x^-i, Xk) or 

f(xu Xk-i, n— 2 

5,25. Show that for the multinomial distribution 


( H'x, = nPi 

cov (xi, xj) = o-x.i, = —npiPj (/■ 1. • • • » fc) 

Hint. The MGF, M = .... t/c-i), given by 

M = 2 , ^t-i) (5.22) 


may prove useful. It is to be understood that the summation is over all 
values of jc,, . . . , x^-i for which /(a:,, . . . , x*-,) 0. 

5,26. Let samples of size n be drawn without replacement from the parent 
distribution with k categories defined in Exercise 5.24. Let Xi denote the 
number of times outcome Oj occurs (» = 1 k). Prove that the proba- 

bility of density function /(x„ . . . , x*) = A(X), . . . , x* ; N,pu . . . ,Pk, n) 
= A is given by 



(5.23) 


where each x, may range from zero to n inclusive and 

<=i 

This expression, Eq. (5.23), is an extension of Eq. (5,18), the density 
function for the hypergeometric distribution. 

5.27. In a human population of size ten it is known that four answer “no,” 
five answer “yes,” and one answers “don’t know” in response to a certain 
question, (a) Use Eq. (5.23) to find the frequency distribution of answers 
if a sample of size four is drawn at random (without replacement) from 
this population, (b) Find the mean and variance for the individuals 
answering “yes.” What is the covariance of the “yes” and “no” answers? 


5.4. SAMPLING FROM AN INFINITE POPULATION 

It is possible to discuss both experimental and theoretical sampling from 
an infinite population just as we did with a finite population. We may think 
of the sampling distribution of some statistic determined from experimental 
sampling as approximating the sampling distribution of the same statistic 
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determined from theoretical sampling— the theoretical sampling distribution 
being thought of as a model Since expcnmcntal sampling from an infinite 
population IS earned on as it is for finite populations, we limit most of our 
discussion in this chapter to theoretical sampling distributions 

Before we continue, it should be observed that infinite populations fall 
into three classes A population may lake on (I) every value on a continuum, 
(2) a countable discrete set of values, or (3) infinitely many values, only k 
of them being different The normal distribution is an illustration of (1), 
the Poisson distribution is an example of (2), and a die in which every 
number may be thought of as occurring infinitely many times illustrates (3) 

In this section we discuss sampling distributions in terms of means, just 
as we did when sampling from finite populations In particular, we first 
observe how Formula (5 5) in Theorem 5 1, relating variances of a finite 
parent population and the corresponding theoretical sampling distribution 
of means enables us to extend the relationship to populations with infinitely 
may discrete values [types (2) and (3) in the above paragraph] For as S 
becomes infinitely large and we sample without replacement, we have 
» a*ln, since {N - n)l{N ~ I) has I as a limit This is the relationship 
stated by Eq (5 13) of Theorem 5 2 Thus, when sampling without replace* 
ment from an infinite discrete population, we are led to the same result as 
sampling with replacement from a finite population Further, it will be 
shown that this relationship between two variances exists when one is sampling 
with or without replacement from any infinite population, including type (1) 
above 

It IS of interest to consider the connection between random sampling 
with and without replacement Conceptually, drawing a random sample 
without replacement from a bow^contaimng indefinitely many disks marked 
1, indefinitely many marked 2, , indefinitely many marked k is the same 

as drawing a random sample with replacement from a bowl containing k 
disks marked 1,2, ,k, respectively In particular, drawing a random 

sample of two objects with replacement from a finite population is the 
same as drawing a sample of two objects without replacement from an infinite 
population 

In an infinite population the probability of drawing a particular value 
IS zero Hence, the probability of drawing a second value is not affected by 
the first draw It follows that random drawings from an infinite population 
are independent of each other It is in this sense that the two methods of 
sampling already discussed amount to the same thing when one is sampling 
from an infinite population So in the remainder of this book, unless it is 
otherwise indicated, when we speak of sampling at random from an infinite 
population, we may mean either sampling with or without replacement 

We have already noted that for infinitely discrete populations Eq (5 13) 
of Theorem 5 2 can be obtained from Eq (5 5) of Theorem 5 1 by letting 
N become infinitely large However, Eq (5 13) along with Eq (5 12) can be 
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obtained by another method. This method indicates how the corresponding 
relations may be derived for the continuous population — that is, the forth- 
coming method serves as a bridge between finite discrete and continuous 
cases and illustrates a method which is very important to the mathematics 
of theoretical sampling. 

Let an infinitely large discrete population have the following distribution, 
f{x) being the density function 


X 

JCi JC; 

. . Aa. 

fix) 

f(Xl) /(a,) . 

.. /(A*) 


We wish to determine the mean p., and variance <r\ of the theoretical sam- 
pling distribution of means of all random samples of size two. Let Xt and Xj 
(i,j— 1,2,..., k) denote the values resulting from the first and second draws 
of the random variable x. Since /(a',) and f{Xj) denote the probabilities of the 
occurrence of and Xj, respectively, and since Xi and Xj are independent, 
the probability /(.v„ .vj) of the joint occurrence of a', and Xj is /(.Vj) • f(xj), 
by Eq. (4.3a). Thus, the probability /[(.V( + a^)/2] of obtaining the mean 
(X( + x,)l2 computed from this sample is /(vi)*/(aj). If we use Eq. (3.7), 
^ the mean of all possible means is given by 

M, = i = ±± •/(*,) -/w 

= ? 2 2 xj(xo ./(.Vj) + -^22 xj(.xi)f{xj) 

i-i 

k k k k 

' = ^ 2 • 2 /(-v^) + i 2 • 2 f(Xi) 

^ 1-1 .7=1 J=1 t = l 

or i 

/ix = i 2 + 12 (5.24) 

; since 

2/(^i) = 2/(^.) = 1 (5.25) 

i‘ ' ' ^ ' 

By definition, the mean of the given distribution is 

f M = 2 xjix,) or /X = 2 (5.26) 

jf *=' 

Substituting Eq. (5.26) in Eq. (5.24) gives 

- ^ = iM- + = /i (5.27) 

Further, by Formula (3.9), the variance of the theoretical sampling dis- 
tribution of means is given by 
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= 22 + )/(«,) - li 

or 

«! = i f 2 *!/W • 2 A»,) + 2 2 »./W • 2 ^i/W 

’■■ <5-2S) 

+ 2 »!/W • 2 /(»■)] - A 

Substituting Eqs (3 9), (S 25), and (5 26) in Eq (5.28) gives 
ni = iKt' + (S’)-' + J/s-l* + (”' + I*')-'! - M' 
or, on reduction 

fSM) 

By a similar method it can be shown that for samples of size » 

(t* = fs 
and 

^, = i (5J1) 


when samples arc drawn «ith or without replacement from an infinitely 
large discrete population 

Next we determine the mean and variance of the theoretical sampling 
distribution of means of all random samples of size two drawn from an 
infinitely large population with random vanable x (— oo ^ x < >») having 
continuous density function /(x) Denote the first and second draws of a 
random sample by Xi and respectively Since x, is a random variable, 
It can take any value of x in the interval (— oo, oo), and u has density function 
/(x,), where /(x,) is of the same form a$/(x) Thus, /(x,) Ax, is the probability 
that X, falls between the limits of any interval Ax, Similarly, Xi is a random 
variable with density function /(Xt), having the same form as /(x), and f{xt) 
Ax, )s the probability lhat Xf falls between the limits of any interval Ax, 
Since X, and x, are independent, the probability /(x,,x,) Ax, Ax, that a 
random sample will simultaneously have the first value between the liraiU 
of Ax, and the secbnd value between the limits of Ax, is /(x,) Ax, ‘/(x,) Ax„ 
by Eq (4 3a) Thu^the probability /((x, + x,)/2] of the mean (x, + x,)!2, 
computed from the random sample *, and x„ falling in the interval 
A[{x, + x))l2] determined from the joint intervals Ax, and Ax, (area geo- 
metrically) IS 
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=/(x,) Ax, ./(X,) Ax, 

From this argument, if we use Eq. (3.20), if follows that the mean of the 
means is give by 


r xf{x)dx 
*' — 00 

= </(iL+3) 

«/_0O V— CO \ ^ / 

"■^[X X Xifixi)f(Xi)dXidXi + £ x,/(x,)/(x2) </xi rfxjj 

= t{X ^i/(-^))[X ^/x, +^^Xj/(x,)|^^^/(x,)l/XijrfX2| 

~ [X„ + X ^2/(^2) 


or 


llz = ll 

since 

r /(x,) rfx, = r /(x,) i/x, = 1 

t/ — CO V —CO 

and 

The variance of the theoretical sampling distribution of means of size 
two, if we use Eqs. (3.22), (3.17), and (3.20), is 


/(x,) dxi — f x^f{x^ dXi = IJ, 

«/— 00 


oi = r xV(x) dx - 

«/ — 00 

— if f (x? + 2X1X5 + xl)f(xi)f(x2) dxi dxs — ijd 

*' — 00 — 00 


dxi + ij x,/(xi) r J Xif(xi) dxfl dxi 


” ^ [X... ^/(^i) ^Xi + 2iiJ x,/(x,) dx, + J xlf(Xi) dx. 




= |[(«r= + fi“) + 2 /i./i + (tr"- + /[!=)] - 11 ^ 


or 
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The reader should check each step in these last two derivations to be sure 
the reasons are clear The proofs arc given in great detail so that a comparison 
may be made with the proofs involving ^ symbols 

The general proof follows a similar argument leading to the results of 
Theorem 5 3 

Theorem 5.3. Let x be a conlmuous random tanable distributed mi/A mean 
fi, lariancea^ and density /unction f{x) 1/ random samples of size n are dra%n 
from this distribution then the sampling dalribution of means has mean pf 
equal to the population mean and xanance o', equal to the population variance 
times a factor of I In, that Is 


Ih * M 



Thus, sampling at random with replacement from a finite population or 
sampling at random from an infinite population leads to the same results 
concerning o’, That is thevananceofthesamplemeanisequaltothepopu- 
lation variance divided by the sample size, provided the population variance 
IS finite 

This IS an extremely important property m the application of statistics, 
since It means that the distribution of the sample mean becomes more and 
more concentrated about the population mean (since = p,) Thus, "as 
the sample size increases wc become more certain that the sample mean is 
a good estimate of the population mean " This is a rough statement of the 
law of large numbers in terms of x One form of a more precise statement 
IS that 


= ? (S32) 

where k is any positive real number Formula (5 32) Is known as Chebyshev's 
inequality [For other forms of Eq (5 32) and more discussion, see the next 
set of exercises 1 It should be observed at this time that the random variable 
need not be x 

Wuinut W.oawixi^ tha Omt?. .jC xJm- 'at *.>?*, ’fMvrK 

It IS possible, if the variance is known, to determine a sample size n such 
that any desired proportion of the sample means falls within k units of the 
true mean The following example illustrates this 

Example 5.12 Determine the smallest random sample for which the 
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probability is at least 0.90 that the average number of defectives drawn from 
a dichotomous population with true proportion defective p unknown is not 
greater than 0. 1 . 

From Eqs. (3.13) and (3.14) we have p ~ p and tr- = p{\ - p). Now h' 
must be large enough so that the following inequality holds 


P[-0.1 <|x - /i|< 0.1] >0.90 
But from Eq. (5.32) we obtain 

P[-k<\x- p\<k]>l 


or 

P[_0.i<l:c-/x!<0.1]>l 
Hence we must find n such that 


or 


1 _ P(t - P) 
n(0.\y 


0.90 


n = 1000p(I — p) 


We must find « such that the inequality (5.32) is satisfied no matter what the 
value of p. It is clear that p = ^ makes /?(! — p) = a maximum and gives 
the required value 

n = 250 


Later we shall find n by taking into account the sampling distribution of 
proportion of defective. Using this knowledge shows that a much smaller 
sample size would suffice. But Chebyshev’s inequality has the ad vanta ge 
t hat it_ d.oesmoLdepend on .the distribution of -the-random-variableTsp lojg 
as the random va riabl e has finite_mean-and_va riance) . 

If we restrict our attention to the sampling distribution of the mean (or 
total), there is a property which is much more noteworthy and useful than 
the Chebyshev form of the statement of the law of large numbers. It is the 
most important theorem in statistics; it is the theorem v\hich gives the normal 
distribution such a central place in both the theory and application of 
statistics. The following statement of this theorem, called the central-limit 
theorem, is one of many forms in which it is written. 

Theorem 5.4 (Central-limit Theorem). Let the random variable x he 
distributed with mean p and variance cr- (but with density function unknown). 
Then the distribution of the sample mean x is very closely approximated by 
the normal distribution with mean p and variance a'ln when n is large. 

A general proof of this theorem is beyond the scope of this book. How- 
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ever, a restricted proof >s outlined m Exercise 5 69 Among the mathema- 
ticians who have worked on various phases of this fundamental theorem are 
DeMoivte, Laplace. Gauss, Chebyshev, Liapounoff. Lindebcrg, Livy, Feller, 
andCramir Laplace [20] hrststatedthetheorem in 1812 In 1901 Liapounoff 
[23. 24] gave a rigorous proof under fairly general conditions Feller. 
Khintchme, Levy, and others |ll, 12, 19. 22] found most general conditions 
under which the theorem is valid 

}i IS an amaiing fact that nothing is required of the form of the distribu- 
tion function except that the vanance be finite In application, the requI^^ 
merit of finite variance is no real restriction, since in almost any practical 
problem the range of the vatialc is finite This implies that the vanance is 
necessarily finite 

One using this theorem would like to know how large n must be before 
the normal approximation is adequate There is no simple answer The size 
of n depends on the shape of the parent distribution Unless the parent dis- 
tribution IS unusual in shape, a sample no smaller than 30 should furnish a 
reasonable approximation Of course, in cases where it is known that the 
parent distribution is normal or approximately normal, fewer than 30 
observations might be used with discretion 

We have already observed the pilmg-up efTect of sample means in the 
neighborhood of the population mean tn Fig 5 I and Table 5 7 In these 
places there is even a hint that the bmiting distribution might be approxi- 
mately normal Assuming that the distribution of samples of size sixteen 
drawn from the population (1. 3. 5. 7, 9) given in Table 5 I yields a fair 
approximation to the normal distribution, we consider the following 
example 

Example 5.13 The mean of the population of Table 5 1 is fi ~ 5 If 
means arc computed for samples of size sixteen randomly drawn with 
replacement, use Theorem 54 to determine a symmetrical interval about /t 
in which 90 per cent of the means fall 

From Example 5 5 we find (he population variance to be t' = 8 Accord- 
ing to Theorem 5 4, ^ is approximately normally distributed with p* " 5 
and <r, = 707 For the standard normal distribution the interval 

from t, = — 1 645 to t, != 1 645 contains 90 per cent of the t values (see 
Example 3 7) We wish to find two values and X, such that F[x, ^ ^ ^-^il 
= 0 90 That IS, and St must be determined from 


or 


\ 


il 645 


X =s ±I 645-ir, + 


Thus, the required interval is 3 84 ^ ^ 6 16 
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The student could make some judgment as to the goodness of tne 
normal approximation by drawing, say, 100 samples of size sixteen and 
computing their means. If roughly 90 of these -means fall in the interval 
3.84 < X < 6.16, he might think the approximation is adequate. (For some 
students it might require 500 or 1000 or more samples to reach a satisfactory 
decision.) 

Note. Usually the sampling distribution of means approaches with 
increasing n the normal curve faster in the neighborhood of the mean than 
in regions some distance from the mean. Thus, the further a point is from the 
population mean, the more slowly the sampling distribution of means can 
be expected to approach the normal curve. 

In Sect. 2.4 we learned, for finite populations, that if y = kx, k being a 
nonzero constant, then = kjix and al = These relations can be 
shown to hold for discrete infinite and continuous populations. Using these 
relations, we may prove Theorem 5.5. 

Theorem 5.5. Let the random variable x be distributed with mean /x, 
variance er^, and density function fix). The theoretical sampling distribution of 
totals T = Xt of random samples of size n drawn from the parent population 
has mean \ij. and variance cr| given by 


p,j. = np, 


and 




n<r‘ 


n(T‘ 


N-n 

N- 


when sampling from a finite population 
without replacement 

otherwise 


(5.33) 


(5.34) 


Proof. We have immediately that 


Pt = lir,T = npi = 

and 

« p 9 

a-j. = cr„x = m cTx 

It should be clear from the theorems of this and the last section that we 
may work with either the sampling distribution of means or the sampling 
distribution of totals, whichever seems most appropriate in a given problem. 
We now illustrate theorems and formulas of this section. 

Example 5.14. A random sample of size two is drawn from the uniform 
distribution with density function 

_ I U when ^ X < i 
1 0, otherwise 
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Find (a) the mean and variance of the sampling distribution of means, (b) 
the density function of the sampling distnbution of means, (c) the probability 
that a sample mean will fall between — ^ and 4 . and (d) the probability that 
a sample mean will fall w iihin two standard errors of the population mean 
For (a) use Formulas (3 25) and (3 26) to obtain n=0 and <r* *= n 
Then from Theorem 5 3 it follows that /t, = 0 and <ri « -^12 = 

To solve (b), let r, denote the first observation and Xt the second obser* 
vation in the sample Since the sample is random, the observations are 
independently distributed Thus. = f(x) is the density function of x, 

and /(v.) - /(r) is the density function of x„ the joint density function 
fix,, X,) of X, and r. is given by/(x„ jr,) =/(x,)-/(x,). or 


Ar,.x,) 


! l, when— 4 ^ 4 and 

0, otherwise 




(5 35) 


The graph of this function is shown in Fig 5 1 

We require the density function of i 
This may be found from the joint density 
function of x, and S The func- 
tion /(X|, X) must satisfy the relation 

AxuXt)dx,dxt »/(*!. 

or 

Axt.x,)dxt ^Axit^)dx 

Since i 8s (x, -t- Xi)/2, it follows that 
Xf sr 2x ~ X, and dx, ^ 2dX Thus, 
at a fixed value of 

/(x„x,)dx, =/{x,,2x - x,)‘2dx 
and 

AXi,X) =/ix„2X - x,)*2 



AXi,X) = 


[ 2 , when -4 5 x, ^4 and 
[ 0 , otherwise 

since -4 ^ X. ^ 4 implies -4 ^ 2« - x, ^ 4 implies 




(5 36) 


implies 
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1 < jc < +i 

4 — — 4 

The graph of /(x,, x) is shown in Fig. 5.2, from which it should be noted 
that the volume of the parallelepiped is one and that the base is not a rec- 
tangle. Since /(x,, x) is a density function 


or 


f f /(^i x) dxi dx = 1 

OO a/— CO 

«1,2 / -2X4(1,J) «l/2 \ 

/ If 2dx, + I 2 dxi) dx ~ 1 ' 

•'2I-(1,2) / 


^21 + ( 1 / 2 ) ^ 1/2 

Let / — 2dxi + I 2dx,. Clearly, / is a nonnegative function 

^- 1,2 *^ 21 - 0 / 2 ) 

for all real numbers in the domain —i<x<^. Further 



and 


0< f" Jdx<l 


A 

for any two real numbers a and b for which — Thus, I 




Fig. 5.2 Bivariate Uniform 
. Distribution 


Fig. 5.3 Graph of Eq. 5.37, the 
Marginal Distribution of x 
Obtained from Eq. 5.35 
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saljsfies the conditions of a density function of X Letting this function be 
denoted by /(X). nvc may wnte 


/(J) = 


r” 2d<u 

Jji-iit) 

L 


when 0 ^ i ^ 
when — 4 ^ 5 < 0 
othcnvisc 


or, on evaluation 


/(f) = 


2-4je. 
2 + 4f. 


0. 


when 0 ^ ^ ^ -i 
when -i^X<Q 
otherwise 


(537) 


The graph of Eq (5 37) u shown in Fig 5 3 The function /(f) is called the 
marginal density function of f A more complete discussion of density func* 
tions obtained from joint density functions is found m the exercises which 
follow * 

The above method illustrates in a s'mple case how density functions of 
statistics may be found Another method, the method using moment generat- 
ing functions, is widely used and. for most of the sampling distnbutienj 
required in this course, is the simpler method This method is discussed m 
a later set of exercises 

To find the probability required in Example 5 14(c), use Eq (5 37) and 
a property of vmmetry to wnte 


Si) 

= 2- f'*i2 ~ 4X)dX 
~ 2[2f - 2f*Jl'* 


This probability is easily verified by looking at Fig 5 3 

Since the density function of X is known, we find from the definition of 
mean and variance that 

H. = JjAlOdX 

= f (2 + 4X) dX + jf''’ x(2 - 4X) dX 
« 0 


and, using Formula (3 22), we obtain 
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ai= f x^f(x) dx - 

V— CO 

X O />1/J 

x^(2 + 4x) dx + f x\2 — 4x) dx — 0 
1/2 •A) 


— 1 
— *57 


These values check with those found earlier without the density function of 
X being known. This points up the importance of Theorem 5.3. 

The standard error of the mean is o-^ = /12 == 0.204. Since p. = /ij, 

we may use the sampling distribution of the mean to find the probability 
that a sample meaii will fall within two standard errors of the population 
mean. Thus 


P[pj - 2a-x < X < iMi + 2o-J = P 


a/7 ^ ^ ^ v7 
■~6“— — ~6~ 


= 2-P 0< x< 


v7 


X >/T /6 

(2 — 4x) dx 
= |['v7 - 1] = 0.966 


Since 96.6 per cent of the means of samples of size two fall between —0.408 
and 0.408, 96.6 per cent of the totals fall between —0.816 and 0.816 when 
random samples are drawn from the uniform distribution from — -J- to 


5.4.1. Some Properties of Linear Combinations of Means 

In Sect. 5.4 we discussed some properties of sampling distributions of 
a single sample mean. In many problems the experimenter is interested in 
comparing several means. In particular, to compare the mean yield of process 
A with the mean yield of process B he must know the nature of the distribu- 
tion of — Xjj, the difference in the means of samples taken from the two 
processes. 

Now let us determine the nature of the mean and variance of the sampling 
distribution of a linear combination of random variables. 

Theorem 5.6. Let 

I = 2 

i»i 

where Qj are real constants, and the yi are random variables with means fly, = fit, 
variances o-J. = a\ and covariances cov {yt, y^) = a-ij{i,j = 1, . . . , p; i ^J). 
Then 
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~ ^ + 2^ a,aje,f (S39) 

Proof Consider the case where /» =2 telfiyuPt) be the joint density 
funciion of y, 2nd Xt Then hy 2 jentralization of Definition (3 41) with the 
use of Formulas (3 53), (3 54), and (3 55). obtain 

= /”/* = + a,y,)f{y„y,)dy.dy, 

= Oi f__ f“_ y,fiyu yt) rf/. rf>. + ^ 3-1 >’•) ‘6'i «6'« 

= a,Hx + a,n, 
and 

= f _f _0 - ttiyfiy,.yt)tfy,^yt 

** /. /. yi) dyi dy, 

+ a\ fjy*- litYAyu y*) ^y* 

+ ii.a, J” J]’ Ck, - ti,){yt - ti>)Ayx,yt) dyi ^yt 


The proof for the case of p random variables » Jclt as an exercise for the 
student For finite populations the proofs are analogous to those for the 
infinite populations 

CoroUtiT 51 If, in addition to the eondillons of Theorem 5.6, »e asswit 
that the y, are mutually independent, then 

but Eq (5 38) remains the same 

Proof If y, IS independent of then cov (y„ y,) = 0, and the second 
term of Eq (5 39) vanishes 

Corollary 5 2. iet JJ, (i = 1, ^ be r/k mean of a random sample of site 
n, drawn from a population with mean p,, = ft, and variance o*, If 
and Xt are independently distributed, then 

(5 41) 


(it =fii~ til 
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5 O-^ JVi — «1 , ffl 

®’*'-** ~ Til' Ni-\ n,' N,-\ 


(5.42a) 


when random samples are drawn without replacement from finite populations 
of sizes Ni and N^, or 


= £l + £l 

«j Ms 


(5.42b) 


otherwise {i.e., when random samples are drawn with replacement from finite 
populations or with or without replacement from infinite populations). 

Proof. Letj;, = Xi and >>2 = Xj so that / = fljji + a^yi = Xi - x^, where 
c, = 1 and Ms = -1- Then on substituting these values in Eqs. (5.38) and 
(5.40) and applying Theorem 5.1 or 5.2 or 5.3, we obtain Eqs. (5.41), (5.42a), 
and (5.42b). 

Note that = /x, + /Xs but when x, and 

Xj are independently distributed. 

Corollary 5.3. If, in addition to the conditions of Corollary 5.2, we assume 
that Ml = Ms = M, (T? = = o-^ and Ni = = N when the populations 

are finite, then 


and 


l^Xi-X, — /ti ^2 


,2 


<t;,_ 


Xl-X, 


11 


N - n 
N~ 1 


when samples are drawn without replacement from finite populations, or 


otherwise. 


<r 


2 

* 




M 


5.4.2. Exercises 

5.28. Prove Theorem 5.3 for the case where n = 3, using the method prior to 
the statement of the theorem. 

5.29. On the average, | of the seeds of a certain strain germinate, (a) If 80 seeds 
are planted, how many on the average (can be expected to) will germinate? 
Find the variance of the number that will germinate. 

Hint. Use Eq. (5.16) of Exercise 5.19. 

(b) What is the probability that fewer than the expected number minus 
5 will germinate? Use Chebyshev’s inequality, (c) Under what conditions 
can the results of (a) and (b) be expected to be valid? 

5.30. Long experience with testing the tensile strength of a certain manufactured 
fiber indicates that, for the process under standard conditions, the mean 
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IS /A K 3 Ib and the standard deviation is cr — 0 2 tb (a) Use Chebyshev s 
inequality to determine the smallest random sample for which the prcba 
bility IS at least 0 90 that the sample mean will not difTer from the popula- 
tion mean by more than 0 05 lb (b) Use the central limit theorem (i e , 
the normal approximation) to find n m Ca) (c) The mean of a random 
sample of size tueniy'five was found to be 2 90 lb Use Chebyshevs 
inequality to determine the probability of a sample mean’s being this 
small or smaller (d) Use the normal approximation to answer (c) 

5.31. Work Example 3 14 after replacing ’’sampling distribution of means' by 
'sampling distribution of totals' and "umple mean” by “sample total " 

5.32. Let a random sample of size Hi be drawn from an infinite population 

with mean ^ and variance tr*(/ — i,2,3,4) Let denote the mean of 
the sample drawn from the iih population Assume that the sample means 
X,, y„ y„ and S, are independently distributed (a) Determine the mean 
and variance of + If, - 3J, <b) Determine the mean and variance 

of jp, + j?, - - f, (c) Determine the covariance of Jh and t e, 

find cov(jc„.jt,) (<j) Answer (a) (b). and (c) when n, *» n. >*»*)*> 
andoj = oMt - 1.2, 3.4) 

5 In Example 5 1 4 we introduced the term “marginal density function of 
S " Let us consider the more general problem, where fix, y) is the joint 
density function of two continuous variates (for the discrete case use 2 
in place of f dx) Suppose that we are interested in finding a density 
function /|(*) of the single variate * Since /,(z) is to be a density function, 
this meant, among other things, that /,<jc) must satisfy the condition 

f'fAx)dx 

for any pair of constants a and b (o ^ 6) Now m terms of the joint 
density function f{x. y) we must have 

Thus, the required marginal density function is given by 

/,b:y = f'nir.y)<ly 

Since this definition aKows the Conditions (3 2) of the density function 
to be satisfied Further, the ciunulatiye marginal distribution function of 
X, for example, is defined by 

Fxix) = J"£"/fjr.y)rfy«fe = j" f,(x)efx = F(x. «) f544) 

Similarly, the marginal density furiction of y is given by 
fAy)~ f Ax,y)dx 
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and the cumulative marginal distribution of y by Fjfy) = F(co, y). 

In general, if /(x,, a:, Xk) is a joint density function of the variates 

AT,, ... , Xk, the marginal density function of any subset of p variates Xi„ 
. . . , Xi, is given by integrating /(x„ . . . , x^) with respect to all other 
variates between the limits — oo and +oo. 

Find the marginal density functions and cumulative marginal distri- 
bution functions for the joint density function found in (a) Exercise 3.55, 
(b) Exercise 3.59, (c) Sect. 3.4.2. (d) Find the marginal density functions 
/(x, y) and /(x), using the density function /(x, y, z) found in Exercise 
3.63. (e) Find the marginal density function of the variates x„ . . . , x *_2 
for the density function given in Eq. (5.20). (f) Find the marginal density 
function of the variates x„ Xj for the density function given in Eq. (5.23) 
when k = 5. 


5.5. PROPERTIES OF EXPECTATIONS 


Often professional gamblers pose questions which relate to how much, 
in the long run, one might expect to win in a game of chance. In fact, most 
of us pose similar questions relating to decisions we must make in our 
day-by-day activities. [We have already used the term “expected number” 
in Exercise 5.29(b)] Mathematicians and statisticians answer the question in 
terms of “expected value.” 

There are many places in statistics where it is desirable to use the term 

“expected value of ,” where the blank space might be filled in with 

any statistic, a function of one or more statistics such as x, -f- 2 x 2 — Sxj, 
e‘*, etc. As a matter of fact, means, variances, moments, and moment 
generating functions may all be expressed in terms of the “expected value” 
notation. Further, once the properties of expectation are developed, we may 
use the same notation for both the discrete and continuous cases. 

By definition, the mean or the expectation or the expected value, E{x), 
of a random variable x with density function /(x) is 


E(x) = Mx 


2 for the discrete case 



x/(x) dx. 


for the continuous case 


(5.45) 


In developing the properties of expectation, we consider only the continuous 
case. It is assumed that the student can supply the notation and argument 
for the discrete case. 

Before we introduce a discussion of some desirable properties of statis- 
tics, desirable from the point of view of estimating parameters, it will be 
useful to introduce some properties of expected values which are based on 
continuous functions of random variables. With this in mind, let h{x) be 
any continuous function of x and let x be a continuous random variable 
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wth density function f(x) Then the expectation or expected value £[/i(x)l = 
£(h) of hlx) js given by 

£I‘Wl = /‘ WW'if (S«) 

In defining ith mon-ents and the moment generating functions m Exercises 
3 4S and 3 47, we gave special foms of Eq (5 46) For example, when 
k{x) = x". Formula (5 46) becomes 

= £ x>Jix)dx (5 47) 

which IS the A:th moment about the origin defined m Eq (3 40) Also, 
when A(x) = e'*. Formula (5 46) becomes 

W) = £ ’“/MiEr (S'**) 

which IS the moment generating function (MGF) defined in Eq (3 43) 

In genera], if /Kx, ,x,) is a continuous function of X|, .Xi, and 
if X,, , Xt are continuous random variables with joint density function 

A^u then the expectation, ,x,)J ta E(ft), of . 

X,) = A IS given by 

EiKxu .x*)] = f f Hx,, ,x,)fix„ ,xt)dx, dxi, (549) 

In defining the mean, variance, and covariance of x, and Xj for multivanate 
distributions given in Sect 3 4, we gave special forms of Eq (5 49) Fot 
example, when 

Mx,. , Xt) = (X, ~ fi,}(Xi - fi,) 

Formula (5 49) becomes 

£[(^< - Mi)(Arj - Mi)l 

r- (5 50) 

= /_ /_ (X, ~ fh){x, - ft,)/{x„ , X,) dx, dXt 

which IS the covariance, oy = cov (x„ Xj) of x, and x, defined in Eq (3 55) 
Note that when i j Formula (5 50) becomes the definition of the variance 
of Xt, that IS 

- n*.) = £((*. - (i,)’I = £([». - i:(jt,)l’) (5 5» 

Also, many of the expressions in Sect 5 4 1 relating to linear combinations 
of random variables may be considered special cases of Eq (5 49) 
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We now list together some basic properties of expectation for easy 
reference. It is to be understood that a denotes any constant, and that is 
an abbreviation for (x„ . . . , x*), which is the mth (/« = 1, 2) continuous 
function of Xj, . . . , 


E{d) = a 

(5.52) 

Eiahi) = aE(ht) 

(5.53) 

E{h, -1- h,) = E{h,) + E(h,) 

(5.54) 

E{xi + "• + Xk) = £(x,) -t- • • • -1- £(xv) 

(5.55) 

£(x1) = F(xO + [£(x0]= + (A 

(5.56) 

E(xrxj) = E(Xi)^E(xj) 

(5.57) 


when Xt and Xj (i ^ j) are independently distributed. 

Note also that the formulas for moment generating functions as given 
in Eqs. (3.43), (3.46), (3.47), and (3.48) can be expressed in terms of expec- 
tations. These properties are not difficult to prove with the aid of the defini- 
tions of expectation given in Eqs. (5.46) and (5.49), the definitions of marginal 
density [for example, see Eq. (5.43)], and other properties given earlier. 

5.5.1. Proofs of Some Properties of Expectation 

We illustrate the nature of the proofs for k = 2 by proving Eqs. (5.55) 
and (5.57). By Definition (5.49) and properties of integral calculus, we may 
write 

E(x, + Xj) = J J (x, -t- Xj)/(x„ Xj) dx^ dXi 

= Xi ^ 2 ) <^Xj j dXi + /(x„ Xs) dXi dXi 

The marginal density functions, according to Eq. (5.43), are 

= r /(Xi, X 2 ) dx^ and g.,{x^ = f /(x„ Xj) dxi 

Thus, by substitution we find that 

£(x, + Xj) = £^x,g,(x,)</x, + Xig^ix^) dXi 

Since g,(x,) and gj(xj) are density functions, it follows from the definition 
of expectation that 


Also 


£(x, + Xj) = £(x,) + E{Xi) 
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£{X,.X,) = f'f XrXt/Xxi,x,)(Jxidxt 

Since X, and at, are independently distributed 

/(jfi.JtO =Aix.)-A(x,) 
and 

E(.x,'Xt) = x,‘Xt/,(x,yf,(x,)dx, dx, 

= f ^x,Mxt)dx, . f ^x,ftixt)dx, 

= £(x, )•£(*.) 


It should be noted that Eq (5 57) requires the independence assumption, 
blit £q (5 55) does not 

The properties of expectation are particularly useful m proving properties 
of variance Four important formulas are 

y{x + 0) » y(x) 

(551) 

r(«)=ifl'K(x) 

(559) 

F(x. + X,) = K(x,) + V{x,) 

(5 60) 

if X| and X| are independent 


F(a,x, + + o.x») * 

(5 61) 


if the X, are mutually independent and at least one constant Oi ^ 0 (see 
Corollary 5 Ij 

To prove Eq (5 58) we write, using Eqs. (5-51), {5 55), and (5 52) 

V{x + fl) = £(t* + o — E{x + o)]‘J 
= + a- E{x) - a]'} 

= £{[x - £(x)l‘} 

= V(x) 

The proof of Eq (5 60) is slightly more involved Using the above proper 
ties and the fact that £(x,) s= and £(xi) = are constants, we may write 

= £llxj + X, - £fxj + 

= E{[x^ - E(x,) + X, - £(x0]’] 

« £((x, - Hiy + 2(x, - ^,)(x, - n,) + {x, - ;i,)’l 
= £((^i - Min + 2 £Kx, - ,t,){x, - /i,)l + £((x, - M.)’] 

= n*-.) + F(x,) 
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since 

E[{xi - Mi )(^2 - Ms)] = E{x-^-Xi - iiiXi - ^2X1 + iiilis) 

= E(Xi'Xi) - EQiiXi) - EijiiXO + £(^i/i2) 

= E{x^)-E(Xi) - iiyEix^) - + Hiiii 

= 0 

5.5.2. Conditional Expectations 

Conditional expectations are useful when one is working with bivariate 
(or multivariate) distributions. Let /(x„Xs) denote the bivariate density 
function of the continuous random variables a:, and a: 2 , and let g(x,) and 
^(^ 2 ) denote the marginal density functions of x, and Xs, respectively. Then, 
Using our definition of conditional probability, we define the conditional 
density function of x, for a given x^ by 

= (5.62) 

for each Xi for which ^(xj) 0, and the conditional density function of Xi 

for a given Xi by 

= (5.63) 

for each Xi for which g(Xi) ^ 0. Hence, the corresponding conditional 
expectations are defined by 


£(a, Iaj) = j 

and 

Xif(Xt 1 As) dXi 

(5.64) 

£(aj I a:i) = J 

respectively. 

f XtfiXi 1 A,) dXi 

— ec 

(5.65) 


When As is fixed, E(x, [as) is a mean which is fixed. But when Aj is allowed 
to vary, Eq. (5.64) is a function of x^, in which case it can be shown that 


£[£(a, I Aj)] = £(a0 (5.66) 

Likewise, when £(a 2 |a,) is considered a function of a„ it can be shown that 

£[£(a2 I A,)] = £(a2) (5.67) 

In the particular case where a, and Aj are indepedent variables, it follows 
that 


and 


£(a, I Aj) = £(a,) 


(5.68) 
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£{x, I JT.) = £(*,) (569) 

The proofs of Eqs (5 66) (5 67). (5 68). and (5 69) are left as an exercise 
for the student In listing the propcnies of conditional expectation we 
considered only the continuous case It is assumed that the student can 
supply the notation and argument for the discrete case 

The properties of this section are used extensively in the next section and 
in later chapters Thus, the reader is encouraged to work several of the 
exercises on expectation at the end of this chapter. 


5 6 ESTIMATION 

We have already indicated in an intuitive way that statistics are used to 
estimate parameters which in turn help to characterize populations We have 
actually worked a few problems involving statistics However, much of our 
time has been spent in the study of sampling distributions of statistics, that 
1 $, in the study of a statistic as a random variable with a sampling disinbu- 
tion Now, we wish to assess the merits of statistics in terms of the properties 
of the sampling distributions 

We have used the sample mean S and sample median to estimate the 
population mean fi, we have used the sample variance s* to estimate the 
population variance o’* In a careful study of the nature of these and other 
estimates, it ts natural that we consider properties which a “good" estimate 
should possess, and attempt to dclcrroinc which, if any, estimate is best 
for a given purpose 

Recall that a statistic was defined (see Sect 5 2) as a numerical value 
determined from some or all of the values which make up a sample Often 
the statistic is computed from the sample values in exactly the same way as 
the corresponding parameter is determined from the population valuw 
Thus, when the sample is representative of the population, we would normally 
expect the statistic to be representative of the parameter For this reason we 
require that the observations of the sample be randomly selected; otherwise, 
we could reasonably infer very little about the parameter 

In studying properties of estimation it is customary that a distinction be 
made between the rule which defines a statistic and the particular value 
which results from applying ihc rule to a particular sample Thus, for exam- 
ple, when we consider x„ , as representing particular values of a random 
sample, the mean x is a value which is called an esiimaie of the population 
mean When, before a particular sample is actually taken, we think of Xi. 

, X, as representing n random observations, then the mean ^ is a random 
vanable and is called an estimator of the population mean The distinction 
between estimator and estimate is the same as that between a function /(x) 
and a functional value /(c) /(x) is a vanable defined in some domain of x, 
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and /(c) is a constant corresponding to a specified value of x equal to constant 
c. 

In general, let 6 denote an unknown parameter of a distribution with 
density function /(x). Let x,, . . . , x„ be a random sample of size n taken from 
this distribution. If we think of the ith (f = 1, . . . , m) sample observation x, 
before a particular value is drawn, it is a random variable which can take 
on any value of x. Let 

I„ = l(x„...,x„) (5.70) 

a function of the n sample observations, denote any statistic corresponding 
to ff. When t„ is thought of as a function of n random variables, it is called 
an estimator of d. After the observations have been made, that is, after a 
particular sample has been drawn, the statistic t„ is a particular value which 
is called an estimate of 6. Our problem now is to study estimators, that is, 
properties of sampling distributions of statistics, not estimates. 

5.6.?. Consistent Estimators 

We observe that an estimator should probably not be considered bad 
simply because it can assume a value which deviates considerably from the 
true value However, if the bulk of the values of t„ deviated considerably 
from 0, we might consider t„ a bad estimator of 0, particularly if this is the 
case as n becomes large. Thus, one of the first desirable properties we might 
require is that there be high probability that the estimator be near the 
parameter it is intended to estimate when the sample size is large. This leads 
us to the following definition. 

Definition. An estimator t„ is a consistent estimator of 9 if, for any 
positive numbers S and e, no matter how small, there exists an integer n' 
such that the probability that [ I„ — 0 J < e is greater than 1 — S for all 
n> n'; that is 

T’Ll tn — 0 1 < e] > 1 — 5 for all n> n' 

This definition is obviously similar to the definition of convergence in the 
mathematical sense, except that here we say that, given any small e, we can 
find a sample size large enough so that, for all larger sample sizes, the proba- 
bility that t-n differs from the true value 0 more than e is as small as we please. 
In this case we say /„ converges in probability to 0. So convergence in proba- 
bility means that t„ is a consistent estimator of 0. 

It is not difficult to show that x is a consistent estimator of p, if it is 
assumed that the population variance o-* is finite. Since, in this case, the vari- 
ance of the sampling distribution of x approaches zero as n approaches 
infinity, and £(x) = p for any n, it follows that the sampling distribution 
of X must become concentrated at p for large n. In fact, if the population is 
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normal, so that the true median and mode are the same as the mean fi, it 
follows that j js a consistent estimator of the population median and mode 
Also, the population variance and many other parameters have consistent 
estimators 

Not all estimators are consistent For example, the estimator x, the 
first observation of a random sample of sire rt, of fi is not consistent Further, 
if X IS the mean of a random sample drawn from a Cauchy distribution with 
density function 


■’(!+(*- ttyy 


( 571 ) 


and mean n it can be shown that ? is not consistent Note that the popula 
tion variance does not exist in both these cases the sampling distribution of 
the estimator is the same as the original distribution Thus, the estimator 
does not increase in accuracy as n increases 

The criterion of consistency is not very practical, since it has to do suth 
a limning property We point out two reasons why this is so First, samples 
have a finite number of observations, and the definition requires an infinite 
number Second when there is one consistent estimator r. of 0, tt is possible 
to define infinitely many For example, when f, is consistent, so is 


+ a 
n + b 


for all fixed real numbers -n and b ^ -rt 


( 572 ) 


50 2 UnbiOied £slimolor 

Next wc give a property of a good estimator which is designed to restrict 
the number of possible consistent estimators When is a consistent esti- 
mator of 6 Its sampling distribution must have practically all of its values 
in a neighborhood of 6' when n IS farge Thus the measure ofccntral tendency 
of /, must be at or very near (f when n is large If we rerrtoxe the rtsirtetton 
“for targe n ’ wc may select from among all consistent estimators a much 
smaller class by applying the following definition 

Definition An estimator is an unbiased estimator of B if 
EiO = (f 

This definition applies for all n and 0 and requires that the mean of the 
sampling distribution of any statistic equals the parameter which the statis 
t.'d •A vitfipituvi Hix. itN^ katn; TWfimtrfi yttviistr, W-V 

other measure of central tendency) of the sampling distribution of /• 1^^ 
equal to 0 But we did not, we defined an unbiased estimator in the above 
way because of its mathematical convenience, and we shall use the term 
“unbiased’ in this way 
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Let X and denote the mean and variance of a random sample of n 
observations drawn from the same infinite population with mean fi and 
variance a-". We have already stated that the mean of the sampling distribu- 
tion of X is equal to the population mean. This follows directly from Eqs. 
(5.53) and (5.55) and the definition of /X- That is 

E{x) = e[ ^'+ 

= ^ E(xi -h • • • + x„) 

= -l[£(x,)+ ••• +£(x„)) 

• • • + M) 




Thus, X is an unbiased estimator of /x. If we use the properties indicated, it 
follows that 


or 


£(3^) = £ 



1 

n 

— 

1 




n 

— 

T 




n 

— 

1 




n 

— 

1 


£(2^ - 

[^Eixd- nE(x^) 

L 

[2 (o;" + - n(o-| -b p-y J 

j^n((7= -b p.’') - ” (^ + P’")] 


E(s‘) = 0-= (5.73) 

Thus, is an unbiased estimator of tr^. This is the primary reason we defined 
the sample variance as we did. 

If we had defined the sample variance as 


2 

(,T = .L__ 

then we would have 


(5.74) 
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__ fl — _1 t 
~ n 

or 

£(***) = (575) 

Thus, we would have a biased esiimator of ff* which is too small on the 
average For small sample sizes the degree of bias is great, but as n approaches 
infinity the bias vanishes 

It can be shown (see Exercise 5 52) that i* is a consistent estimator 
Thus, the biased estimator 



IS a consistent estimator We know that the first observation x, in a random 
sample of n observations is not a consuteni estimator of n Since E(x,) = Pt 
*, IS an unbiased estimator of fi Therefore, unbiased estimators may not 
be consistent Thus, neither property implies the other However, a consistent 
estimator r„ with finite £(r.) must tend to be unbiased for large n 

5 0 3 Bert Eifimotorr 

There may be several consistent and unbiased estimators of a parameter 
from which we wish to select the best according to some criterion It seems 
fairly reasonable that we should select that estimator which is consistently 
closest to the parameter being estimaled But what should the measure of 
closeness be'’ It is customary that the vanance of the sampling distribution 
be used for this purpose 

In view of the preceding discussion, we say that any estimator which has 
minimum variance among all possible estimators of 0 is an efficient estimator 
There may be several estimators of 0 which arc clficienl In comparing any 
estimator tl, wnh variance F(r0 against an efficient estimator f, with variance 
k'(r,), we use the ratio 

mA (5 76) 

vu:) 

called the efficiency of t!, 

For example, consider the comparison of two estimators, the mean and 
the median, computed from a random sample of size n drawn from a normal 
population with mean fi and vanance «t* Let f, = i and li = Xm H 



SECT. 5.6. 


SAMPLING AND SAMPLING DISTRIBUTIONS 


175 


be shown that t„ is normally distributed with mean /x and variance K(r„) = 
and, for large n, that is approximately normally distributed with 
mean ii and variance 


Both estimators are consistent and unbiased, but the variance of the mean 
is less than the variance of the median. Therefore, in this case, the mean is 
considered a better estimator of ft than the median when n is large. For 
small n the variance of the median must be determined for each n. Since it 
appears that V(t„) < VOn) always, we say that .x is a better estimator of x„ 
when one is sampling from a normal population. The reader is cautioned 
that this is not necessarily the case for all distributions. 

For samples of size n randomly drawn from a normal population, it can 
be shown that the minimum variance of all unbiased estimatprs of fi is o-^/n. 
Since F(x) = a-'^jn, x is an efficient estimator, and Xm has an efficiency of 
2[7t when n is large. 

For fixed n, it is possible for one estimator to be biased and another 
estimator tn to be unbiased and still to have consistently closer to 6 than 
t^,. In this case the measure of variability of t„ about 8 is not the variance. 
It is 

£[(tn - en (5.77) 

which is called the second moment of t„ about 8. When 8 is equal to the mean 
of t„, then Eq. (5.77) reduces to the variance of that is, when 8 = E(jn), 

£[(tn - ef] = V{t„) 

With this in mind, we give the following more general definition as a basis 
for selecting good estimators. 

Definition. An estimator t„ is called a best esiimafor of the parameter 
6 if tn is such that 

£[(^. - 0)1 < Em - ey\ (5.78) 

where t^ is any estimator of 8. 

Note that this definition does not require either the consistency or unbi- 
asedness property. However, for unbiased estimators, as n approaches infinity, 
we see that the consistency property necessarily follows. This definition of 
a best estimator has certain disadvantages, but so do other definitions which 
could be substituted. In any case, our definition of best estimator has proved 
to be very useful in both theory and application. 

Example 5.15. Let x„ Xj, . . . , x„ be a random sample from a population 
with mean /i and variance cr-. Let 
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3jt, - X, - X.., + 4x, 

5 

<«' = 4*1 + 4*1 + 4*, 

(a) Which, if any, of the estimators li, /i', /i" are unbiased estimators of 
Show why (b) Which of these three estimators is best7(c) Of all possible 
linear unbiased estimators of /t, which is the best? Show why 
According to expectation properties (5 53) and (5 55), we have 

Since X| (t s 1, , n) is a random variable distributed like the parent popu 

lation, It follows that E{x,) « ft and y(x,} «= <p* for each /. Thus 

£(,a = i(B+/i) = ^ 

Following a similar argument, we find that 

£(,;') = L£(*.> - £(**) +_i£(*.) = ^ 

and 

£(C) = + J^fx.) + iElx^ = M 

Thus ri' and are unbiased estimates of fi when n is equal to or greater 
than 4 and 3, respectively The estimator /, underestimates ft except when 
n = 2, in which case it is an unbiased estimator of ft 

In order to determine which of the three estimators is best, we must com- 
pute £{(t, — ;i)’] for each When £(f J 0, we have 

£[(/. - tf)’] = £{ll\ - £(/,)! + f£(rj _ #]]'] 

= £l[t. - £(fon + 2EUt, - £(r,)][£(r,) - eil 
+ £{[£(M - m 

or 

£K>n - m = K(r.) + Cfft.) -GY (5 79) 

E{[r, - £(r,)II£(r.) - ej] = {£(f.) - e]E[u - E(f.)] 

= [£((.) -£(a 

= 0 


since 
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Thus, for the first estimator of we have, on substituting in Eq. (5.79) 

Em - fif] = y(tn) + 

P 

Since, when Eq. (5.61) is applied 

^ F(x,) + ^ Vix„) = ^ 


the second moment of about (i becomes 

Because the other two estimators of p. are unbiased, the last term of Eq. (5.79) 
vanishes, and we have 

E[{t'^ ~ iiy] = K(f") = A vixi) + ik nx,) + A nx„-.) + i# v(x„) 

— if°‘^ 
and 

Em' - l^y] = Viti!') = i Vix,) + i Vix,) + A ViXn) 

_ 7 _2 

— 

Hence, is always a better estimator than r'', since < Vit"). The 
comparison of t!, and t”' depends on n and /x. First, we note that t"' is not 
defined unless n > 4. Now V(tl,) < F(t'") when 


or 


or 



(5.80) 


Finally, is the best estimator of /x when Eq. (5.80) and n > 4 are satisfied; 
otherwise, t”' is the best estimator when n > 4. 

Any linear estimator of /x is of the form 


tn — OiXi + • • • + 0„Xft 

where a„ . . . ,a„ are real numbers with at least one a, (i — I, ... ,n) being 
different from zero. Now t„ is unbiased when E(t„) = /x. Since E((„) = 
°iE(xO + ... + a„E(x„) = (a, + — + U is an unbiased estimator 
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of n When a, + • + c„ = I This rcstnction need not be stated if we 
write f, as 


fciXi + + ^.ar, 

51 + +~sr~ 


(581) 


where&i, , 6, are real numbers not all ofwhich arc zero SinceEq(581) 
IS an unbiased estimator of the second moment of /, about n is also the 
variance of r. 

On applying Eq (S 61) we find the vanance of I, to be 




(5^2) 


We require values of i , 6, which make »'(»,) a minimum From the 

calculus, we know that such values must satisfy the n equations 






simultaneously The partial derivative of F(r.) with respect to b, is 


dbj 


Letting each of these n pirtial derivatives equal zero leads to the unique 
solution 


-SJl 

'2‘. 


i;+ t-ij 

b, + +6, 


0^1 .n) 


Thai IS, Ffr,) is a minimum when 6, = ^ b„ = b Hence, the coefh 

cient of Xi m Eq (5 81) is 


b, ^ ^ 
b, + + A, nb 


Jl_ 

n 


so that f, = AT Therefore, no linear combination of the observations can 
have a smaller variance than x, that is. t is the best of the linear unbiased 
estimates of /i 

As was pointed out earlier when the random sample is drawn from a 
normal population, the minimum variance of all unbiased estimators of 
isff’/n Since V(x) ~ a’ In, jt follows that Jc is not only the best linear unbiased 
estimator of ji, but is the best function of any kind which can be used to 
estimate fi 
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5 . 7 . THE METHOD OF MAXIMUM LIKELIHOOD 

By this time the reader must have asked, “Is it possible to describe a 
method which gives estimators with desirable properties, or must one always 
apply some aribitrary method when introducing an estimator of a para- 
meter?” Fortunately, it is possible to describe a method with many desirable 
properties. There are actually several methods [3, 5, 18], all of which have 
some merit. 

We describe a very popular method, the method of maximum likelihood 
described in general by Fisher [13, 14]. This method is attractive because it 
gives, under fairly general conditions, estimators which are often best or 
nearly so and which are often quite easy to obtain. There are cases where a 
maximum likelihood estimator is very poor, but in most applications the 
estimator have desirable properties. Another important feature of the method 
of maximum likelihood is that it yields estimators which are approximately 
normally distributed for large samples. For fixed n the maximum likelihood 
estimators are often biased. Fortunately, in cases where an unbiased estimator 
is desirable, it is often possible to multiply the maximum likelihood estimator 
by a coefficient involving only n so that the resulting estimator is unbiased. 

Let f{x ; 6) denote the density function of a random variable x, where 
6 is the parameter to be estimated. The form of the function is assumed to 
be known, but 6 is unknown. Let x,, . . . , Ar„ denote a random sample of n 
observations. Then there are tt random variables, each with a density function 
of the form /(x, ; 6/) (/ = The joint density function, /(x,, . , . , 

Xrt ; 6), of the random variables Xv, . . . , .v„ is given by 

/(xi, . . . ,x„; 0) =/(x, ;6').../(x„;6') 

■ since x,, . . . , x„ are mutually independent. If we think of x,, . . . , x„ as 
values for a particular (fixed) sample, then /(x,, . . . , x„ ; 5) is a function of 
d only. Such a function is called the likelihood function and is indicated 
by L{6). Thus, for a fixed set of sample values x,, . . . , .x„ 

= (5.83) 

i~\ 

is the likelihood function of Q. That value of d, denoted by 6, for which US') 
is a maximum is called the maximum likelihood estimator of 6. Definition 
(5.83) applies for both discrete and continuous cases. 

There generalization of Eq. (5.83) to a univariate density function with 
more than one parameter is straightforward. Let /(x; 0,, . . . , d^) denote the 
density function of a random variable x, where 5,,...,5i represent k 
parameters to be estimated. Then for a fixed sample of values x,, . . . , x„ 
the likelihood function of Ox, , 6,, is 
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m, = (SM) 

Those values of the k parameters denoted by „ » 5„ for which Ug 

,6,) IS a maximum are called the maximum likelihood esfimiifow of 
. ff, respectively 

Example 5 Id A random sample of size n m drawn from a normal 
distribution with mean p and variance a* 

(a) Find the maximum livelihood estimator of ji when <r* = 1 (b) Find 
the maximum likelihood estimators of n and o* For (a) the umvanate 
density function is 


Ax.fi) 



and the likelihood function is 


m » - 




e ¥•' 

~ v'Zir ~ 


E0t) = £- 




That value of fi which maximizes IXji) will also maximize logc Zfu) ^ 
Now 


f(M)= -'y)og,{2;r) 


When 

^'=<’ 

it follows that 2 (-’^1 - A) = 0 or 

,1 - 2*. _ 


Thus the maximum likelihood estimator is the sample mean which is 
unbiased 

For (b) the umvanate density function is 
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f(x ; /i, er*) 


V2^*o- 


and the likelihood function is 


L(ji,cr^) 


g-Z(l|-A)V(2ff«) 

(27r)"'V’' 


Those values of [i and o-’^ which maximize L(ji,(r^) will also maximize 
logjL(/i, 0 -=) = li}i, 0 -=). Now 


/0i,O 


2 


-^-logto-' - -^loge(27r) 


Therefore 


and 


When 


3/i 2<r® 


a/(/T,er^)_ -2 (x. -/!)“(- 1) n 1 

30-= 2(0-=')* 2 a-^ 


it follows that 


“dlin, 0-^) _ 0 


and 


3(7® 


= 0 


2 (X, - A) _ n 

— V 


and 

2 (xi - A)" _±-n 

^4 2 

/t’ #T’^ 


Solving these last two equations simultaneously gives 

fi = x and y = SJa-Zll = 2 fa - *)’ 

n n 

Clearly, A is an unbiased estimator of /i, but a® is a biased estimator of o-®. 
However, as we have already noted, na^jin — 1) = j® is an unbiased esti- 
mator of (7®. 

Example 5.17. A random sample of size n is drawn from a dichotomous 
population with density function 


d{x-,p) = 


1 — p, when X = 0 
[ p, when X = 1 

(a) Find the maximum likelihood estimator of p if the successful event 
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(jt = 1) occurs «i times where 0 < n, < « (b) Find the maximum likelihood 
estimate if it is known that p = 02, 04, 06, ot 08, /i = 6, and n, = 2 
The likelihood function is the product, in some order, of n, probabilities 
p and n — n, probabilities 1 - p That is 

i.(p) = p'.(l-pr- (585) 

Event though v is a discrete variable, the likelihood function may be con- 
sidered a continuous function of p forO < p < 1. Now 

I = log.L(p) = n, loe,p + (fl — ni)Iog,(I - p) 
and 

<// _ _ B — n, 

^ p \ - p 

When dlidp — 0, then 

«.(J - P)- {n- it,)p^0 
or 

' n 

For (b), Eq (5 85) is a function of a discrete variable p To find tie 
maximum likelihood estimate of p, we simply substitute each possible value 
of p in Eq (5 85) and observe which value makes L[p) a maximum New 

L{p = 02) = (02)’(08)* = 0016384 
L{p « 0 4) = (0 4)»(0 6)‘ ** 0 020736 
L(p a 0 6) « (0 6)*(0 4)* = 0 009216 
L{p * 0 8) = (0 8)’(0 2)* * 0 001024 

Thus p = 0 4 IS the maximum likelihood esiimaie of p 

If n and n, were not specified, we would expect the estimator to be that 
fraction closest to n,/n In the above case njn = j — 0 33 is closer to 0 4 
than 0 2 Thus, we would expect 0 4 to be the estimate of p 

SB EXERCISES 

5 34. LetjTi.x, x, be indepcndcnirandom variables with Efx,) = 3, 

^(x,) = -1, y(x,) = I. Vix,) = 2, and t'(x,) = 3 Find (a) E[x, + 
2x.- - ixA- m-EOx, -^ffy yf.r, (<» Which, 

if any, of the answers in (a), (b), and (c) depend on the independence 
of the random vanaWes’(c)FMjd cov(jr„ JX where J = (*, + x, + Xi)!^ 
5 3S Prove Eqs (5 52) (5 53). and (5 54), properiies of expectation 
5J6 Prove Eq (5 55) for Jk t- 3, in general 
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5.37. Prove Eq. (5.56) for a bivariate distribution. 

5.38. Prove Eqs. (5.59) and (5.61). 

5.39. Two dice are tossed and the sum x on the two faces observed. Find 
E[x) and V(x). 

5.40. Work Exercise 5.39 for n dice. 

5.41. A well-balanced coin is tossed until a head appears. What is the expected 
number of tosses? 

5.42. Prove Theorem 5.3. 

5.43. Prove Theorem 5.6. 

5.44. The random variable x is normally distributed with mean yx and variance 
o-=. Find E[(x - H-V] and Efx’). 

5.45. A seasonal item brings a net profit of P dollars for each item sold and 
a net loss of L dollars for each item not sold at the end of the season. 
Suppose the number of customer orders, x, during any season is a random 
variable which has the Poisson density function [Eq. (3.15)] with m — 5. 
How many items should be stocked so that the expected value of the 
profit is a maximum? 

5.46. Prove properties (5.66) and (5.68) on conditional expectation. 

5.47. The joint density function /(x,, X;) of two discrete random variables Xj 
and Xj is given in the table. Find the conditional density function of x, 
and the expectation £(x, | Xj) for each x^. Use Eq. (5.66) to find £(x,). 

Table 5.8 



1 

2 

3 

0 

0.2 

0.1 

0.0 

1 

0.0 

0.1 

0.3 

2 

0.1 

0.2 

0.0 


5.48. If /(x,, Xj) = 2, 0 < X, < x., 0 < Xj < 1, find the conditional density 
functions, ^(xi l^j) and Efx, (x,) and Eix^). 

5.49. A box contains five similar circular disks, each bearing exactly one of 
the marks 1, 2, 3, 4, 5. If two disks are drawn without replacement, what 
is the expected value of the sum of the two numbers? 

5.50. If /(x„ Xj) = X, 4- x„, 0 < X, < 1 , 0 < X: < 1 , find Eix^ 1 x,) and E(x^). 

5.51. If t„ is an unbiased estimator of 0, can it be expected, on repeated 
sampling, to underestimate the true parameter 6 half the time? Explain. 

5.52. If the variance of a random sample of size n is defined as 

^'2 ^ 2 iXj - x)- 


it can be shown that 


n 
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where 

/I, = E{(x - /A)*] 

(a) Show that i'* is a consistent estimator Show that j’ is also a con 
sistent estimator (b)Find PCs’*) when the sample is drawn from a normal 
population 

5J3 For random samples of size 2fl let 

2 *1 2 *1 
Tf = Ond Sf ~ ‘ 

denote two estimators of the population mean /a Which is the better 
estimator of /**> Explain why 

5^ Let X, X| be a random sample from a population with mean /» 
and variance o’ Let 


I, B J /, s 7(.iri + *<) 

I _ 2xi + Xi -f Xi 4- 2xt 
* 6 

denote estimators of M (a) Which if any of the estimators of are un- 
biased (b) Which of the five estimaiofs is best? (c) Let 


Is r, an unbiased estimator of o-*7 Explain (d) Are any of the estimalon 
of n consistent’ Explain 

535 Three random and independent samples of sizes n =12 7ii = 8 and 
fij = 4 drawn from a population with variance <z’ have unbiased variances 
j* i’ and respectively (a) Prove that 


I2s* -I- 8^ -I- 4 j» 

24 

IS an unbiased estimator of <r* (b) Construct another linear function 
of the three sample variances which is an unbiased estimator of <r‘ 

5 56 Let X, ,xj be random variables which are independently distributed 
with means 0 and variances «r* «rj Prove that 
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has minimum variance when the weight is inversely proportional to 

cr\(i = . ,k). 

5.57. (a) Find the variance of f in Exercise 5.56 when 6i = Ifk (f = 1, . . . , k). 
(b) Assign values, not all equal, to the o-®’s in Exercise 5.56 and compare 
the variance of f found in (a) with that found in Exercise 5.56. 

5.58. A random sample of size n is drawn from a normal population with 
mean zero and variance <r". Find the maximum likelihood estimator of o’®. 

5.59. Find the maximum likelihood estimator of 6 for the density function 

= x^O 

5.60. Find the maximum likelihood estimator of m for the Poisson density 
function. 

5.61. Two observations drawn at random from a population with density 
function 

f(x ; ni) = 1 + >nix — 0 < x 1 

are to be used to find an estimator of m. (a) Find the maximum likeli- 
hood estimator of m. (b) If the observations are 0.65 and 0.90, what is 
■. the maximum likelihood estimate of m? (c) Discuss the properties of 
this estimator. 

5.62. Suppose the random variable x is uniformly distributed over an interval 
of length I with mean /a, that is center point, unknown, (a) Find the 
maximum likelihood estimator of a random sample of size n taken 
from this distribution, (b) Find the maximum likelihood estimate of 
/i if the random observations are 3.2, 2.7, 3.0, 2.2, and 3.4, and it is known 
that / = 2. (c) Use the results of (b) to select the best estimate of p> and 
to define the best density function. 

5.63. Let Pi, Pz, . . . , Pk denote the probabilities that k mutually exclusive and 
exhaustive possible outcomes 0„ Oj, . . . , O*, respectively, will occur 
on a single trial of an experiment. A random sample of size n is drawn 
with replacement from such a distribution. Find the maximum likeli- 
hood estimators (/ = 1, — ,k), if denotes the observed frequency 
in the /th category Of and «, + ■•• + nk=^ n and Pi + • • • 4- Pt = 1. 

5.64. For a certain large group of people, suppose p® 4- 2pr, p® 4- Iqr, Ipq, 

and r® denote the probabilities of the four blood groups A, B, AB, and 
O, respectively, where p 4- g 4- r = 1. (It is to be understood that p, 
q, and r represent the probabilities in the population of the genes for 
A, B, and O, respectively.) (a) In a random sample of « = 4- «2 4- 4- 

n^ people from the population in question it is found that « 2 , n^, 
and have blood groups A, B, AB, and O, respectively. Find the 
maximum likelihood estimators of p, q, and r. (b) Find the maximum 
likelihood estimates of p, q, and r when the actual frequencies for the 
four blood groups are n, = 192, «2 = 119, - 62, and n^ = 127. 
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5 65. Since Che likelihood function is a Taaction of parameters, the maximum 
likelihood estimators for multivariate distributions are found by the 
method used with univariate distributions Let , 

(An. denote a random sample of n pairs taken from a bivariate normal 
distribution with parameters f*j, /*,, <r*, o-J and p (a) Find the maximum 
likelihood estimator of p when = /s, = 0 and o-* = o-J = 1 (b) 
Find the maximum likelihood estimators of p,, and p when a\ = 
uj = I (c) Find the maximum (tfccfihood estimators of «rj. and p 
when ~ p, 0 (d) Find the maximum likelihood estimators of 
p„ p,, a\, <r^ and p 

5 66. In Sect S 4 I we considered only linear combinations of random van 
ablcs Sometimes it is desirable (hat we find the mean and vanance of 
an arbitrary function y /Ka,. , a<) of A variables x,. ,x, It is 

usually difficult to find the exact values of the mean and variance of y 
However, if y s A(x„ .x,) is approximately linear over practically 
the whole range of variation of (x,. .x») chert y can be adequately 

represented by the linear terms of its Taylor senes expansion Then 
the mean and variance can be found by the methods already desenW 
For the case where & - 2 let the means, variances, and covariance 
of X, and x, be denoted by p, p, ej. o\ and c,. On expanding y ® 
h[x„ Xt) about the point (/*, /»,> and neglecting terms of degree higher 
than one in (x, - /*,) and (x. - fu). we have 

+ *.(/»!. Mt)(A| - f*.) + h|(Mi.f*i)(Ai“ fb) 

where 

are partial derivatives of h with respect to x, and x, evaluated at (Pi, fO 
Hence, using the properties of expectation and Theorem 5 6, we have 
immediately 

e“> 

and 

F(y) = AJ(/i„/r.)o-* + + 2A,0*,,s*J htip^.p^ixii (5 87) 


flA.CFi./hTfA. ~ = A,(#h.F,)f:(x, - yt.) = 0 

and f'ly — fi(p„pi)}= y(y) If x, and Xt are independent, the last tenn 

of Eq (5 87) vanishes The denvation of expressions for £(y) and F(y) 
in the general case is Jell as another exercise for the student 

Note As a rule, it appears that Formulas (5 86) and (5 87) are 
adequate when the standard deviatiofl of any variate is not greater than 
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20 per cent of its mean. (This is just an empirical rule and should be 
used with caution.) 

(a) Find the approximate mean and variance of >- = (at, — (b) 

In case and x, are independent, an example of (a) is given in Statistical 
Analysis in Chemistry and the Chemical Industry, by Bennett and Franklin 
(pp. 52-54), in which y is the percentage of moisture in a sample of coal, 
x, is the original weight, and x« is the final weight after heating. Study 
this reference. Other examples appear in Statistical Theory with Engi- 
neering Applications, by Hald (pp. 246-51), and Statistical Methods 
in Research and Production, by Davies (pp. 48-50). 

5.67. Often, as in Example 5.14, we make transformations of variables. The 
evaluation of single and multiple integrals is often simplified by applying 
the appropriate transformation. Further, we are often required to obtain 
the density function (or distribution function) of some statistic 
g = g{x^, . . . ,Xn) defined in terms of the variates Xt,...,x„ from 
an «-fold integral in which the integrand is the joint density function 
f(Xi,...,x„). For these reasons we state and discuss an important 
theorem involving transformations of double integrals (the ideas are 
easily extended to /j-fold integrals). It is assumed that the reader is 
already familiar with transformations of integrals in the univariate case. 

If the transformation a:, = /j (:v„ ys). = t^iy^, y^) represents a 
continuous one-to-one mapping of the closed region R of the x,X 2 -plane 
on the region R' of the y,y.-plane, and if the functions r, and t^ have 
continuous first derivatives and their Jacobian 


dx, 3x, 
j( XuXA _ ^^y^ By™ 
\J’l>y2/ 3X2 

Byi By; 


Bx, _ 3^2 _ 3^ _ 3^ 
Byi ’ Bys By, ‘ Byj 


is either everywhere positive or everywhere negative, then 


/ f(Xi, X,) dxt dx, = nti (y,. yz), r 2 (y„ yj)] | | dyi dy, 

(5.88) 

where 



denotes the absolute value of the Jacobian. To avoid having to solve 
for the inverse functions in a set of equations, it is convenient to know 
that, under very general conditions, the Jacobian of the inverse system 
of functions is the reciprocal of the Jacobian of the original system-; i.e. 


J 




Thus, in Example 5.14, if we let x, = x, and 


(5.89) 
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Hence, from the fact that 


and the above theorems, we may wnte 

+ j'^^^2dx,'^dX = 1 

(From Exercite S 33 we know that the function of S inside the brackets 
IS a marginal density function ) 

(a) Find the density function of the sampling distribution of means 
of samples of sue three drawn at random from the parent populatieo 
of Example 3 14 

(b) Find the density function of the sampling distribution of means 

of samples of size n drawn at random from the distribution with density 
function /(x) as /-», x S 0 _ . 

(c) Solve (a) if the population has density function /{x, 0) *(1 + 

0 > 0. 0 ^ X ^ 1 

(d) Find the density function of the sampling distributions of vari- 
ances of samples of size two drawn at random from the parent population 
of Example 5 14 

5 6S. The Chebyshev inequality applies to any random variable tfor example, 
the sample mean X considered in Eq (5 32), the range, the median], 
provided it has a mean and vanance Now, we state and prove the gener 
theorem Let y be a random variable with mean y* = E(y) and variance 
er* = y(y) Then if k is any positive number 

Proof Let f(y) be the density fuiKtion of y and note that 


may be written as 
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X lt—fc /• 

(y — ^y + I (y ~ 

-oo -Jii-f: 


+ f (y- /i)V(3') dy 


(5.91) 


Since the second integral on the right-hand side of Eq. (5.91) is non- 
negative, we have 

<r^> f** * (y - pyfiy) dy+f (y - fi)^f{y) dy 

J-oo ^ii+k 

or 

<r’> r iy-p)^Ay)dy 

In this last expression the integrand is always at least as large as k^fiy) 
over the range of integration. Thus, we may write 


cr^^k- f f(y) dy = k^P[\y - H-\^k] 

‘'IV Mfe*. 

and it follows that Eq. (5.90) holds. 

(a) Use Chebyshev’s inequality in the proof of the following im- 
portant property known as the weak law of large numbers. Let 
>> 1 , j' 2 , . . . be an arbitrary sequence of random variables with expectations 
E(yi), E{y^, Assume that 


Si'. 

<=1 

has a variance for each positive integer n. If 



approaches zero as n approaches co, and if A: is a positive number, then 


P 




>k 


approaches 0 as n approaches oo 


(5.92) 


or 



as n ->-oo 


(5.93) 


where the symbol — >■ denotes “approaches.” 

(b) Prove the following corollary to Eqs. (5.92) and (5.93). Let x be 
the sample mean of a random sample of size n drawn from a population 
with mean ft- and variance a--. If A: > 0, then 


or 


P[l;x — /i|^A:]->0 as«->oo 


(5.94) 
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asrt-voo (5 95) 

hlott See Fetler {10] and Munroe [29] for further discussions of 
the topics in this exercise as «e1l as discussions on the strong law of 
large numbers 

S 69. We said earlier that the general proof of the central-limit theorem is 
beyond the scope of this book, but stated that we would give a restneted 
proof later Now we give this proof for distributions which have moment 
generating functions (It can be shown that a distribution function is 
urtiquely determined by its MGF) The proof consists in showing that 
the MOF for the sample mean approaches the MGF for the normal 
distribution and in applying a theorem which states that “two random 
variables which have the same MGF have the same density functions 
except possibly at points of disconimuity ** (Actually, the MGF is of 
more importance in determining the distribution of sample statistics 
than It IS in finding moments ) 

Proof Let fix) be the density function of any random variable x 
with mean ^ and variance <r* which has a MGF The MOF, 
of the standardiaed variate y _ (jr - is given by 

suit) - J* ” e"* •’ •" fix) dx (5 96) 

By Theorem 5 3. the mean and vanatKC of the random variable i ate 
and - a'ln So the MGF A/<i), for c ss (y - >»)'(<r/v^) 
may be written, using the expectation notation, as 

AM» 

Since X, ,x, is a random sample with x, having density functions 
fiXi) — fix) (( 1 , ft) the joint density function of the set 

(X|, ,x,) of variates is given by 

/(«.. 

Thus, we have 

- n e - )/(x,)(/x,j 
or 

M,«) = n[".(;7;^)] - 

Since by Eq (5 96) each factor has MGF M,iilVn ) Using Eq (3 44), 
we may write 
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^''{ 77 ) = ^ ^\Un) ^r( V^) ■ 

(5.97) 

Since y = (X - /i)/cr, /^Uy =fJ^y = (jJ-,- f^)l<r = 0, V{y) = lV(x)]la^ = 1 
and iA:y-(7i:vf = /^:v='^- Thus Eq. (5.97) may be written as 

^y{-:h) = ^ 307 ^" + ■ ■ ■) 

Remembering that, by definition 

lim + 


and observing that 




is of the form of [1 + (x/n)]", we may write 

limM,(0 = (5.98) 

ji—oo 

since 



vanishes as ;i -i- 00 . According to Exercise 3.49(a), the moment generating 
function of the standard normal distribution is 


M„(0 = e"/"- 

Since in the limit the MGF of z is the same as the MGF of «, we conclude 
that in the limit z must be normally distributed with mean zero and 
variance 1, no matter what the distribution of x (so long as it has a 
MGF). Thus in the limit x is normally distributed with mean /i and 
variance a-jn. 

5.70. Sampling from a normal population is so important that we devote the 
next four chapters to this topic. However, in order to illustrate some of 
the principles of this chapter the student should prove that “if x is normally 
distributed with mean /x and variance cr=, then the sampling distribution 
of the means of random samples of size n is normally distributed with 
mean /x and variance ar-jn.'" 

Outline of the derivation. Write the joint density function x,, . . . , Xn 
as the product of n univariate density functions; let 

fl— I 

‘ x„ = nx - 2 X, 

(=1 

so as to obtain a joint density function of x,, . . . ,x„_,, x, and then 

find the marginal density function of x by integrating over Xi, , x„_i. 

The marginal density function is the required normal density function. 
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SAMPLING 

FROM NORMAL POPULATIONS 


In many applications it ts reasonable to assume that the parent popu- 
lation IS approximately normally distributed, or that the data can be tranv 
formed so as to be approximately normally distributed In either case ismpl 
mg distributions derived from a parent normal distribution are of pntne 
importance in problems of estimation and tests of hypotheses We state and 
illustrate some very important theorems concerning sampling distributions 
(A proof Of an outline of a proof of each of these theorems is found among 
the exercises at the end of (he chapter ) 

6 1 /NTRODUCTION 

The concept of a sampling disinbution of any staltsuc deteroufled 
from airy population should be cleat by now In Chap 5, we showed how 
the mean and vanance of sampling distributions of means, totals, 3"“ 
linear functions of random variables arc related to the means and variances 
of the parent populations In the exercises, sampling distributions of certain 
other statistics, such as median and range, determined from samples drawn 
from particular populations were considered In a special case, we iHu** 
(rated how the density function of the sample mean X might be found by 
using the marginal density function But the emphasis was generally on 
understanding’ the nature of the relationship between parameters m tht 
parent population and parameters in the sampling distributions 

In this chapter we study the sampling distributions from the point o 
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view of density , functions obtained from a parent normal density function. 
Such functions enable us to cornpute tables useful in obtaining confidence 
intervals and in testing hypotheses. In order to point up the usefulness of 
the theorems of this chapter, we introduce methods for finding confidence 
intervals and .testing hypotheses without actually theoretically justifying the 
techniques. 

In Chap. 5 we discussed properties of estimators. We learned that, for 
a particular sample, the estimator takes on a particular value which may be 
used to estimate the unknown value of a parameter. Such an estimate is 
'sometimes called a point estimate. Unfortunately, a point estimate, eyen 
a best point estimate, may deviate so much from the parameter that a single 
value may not be considered satisfactory. Thus, it is customary to estimate 
a parameter U by any value in an interval. 

Let Xi, . . . , x„ be a random sample from a distribution with density 
function f(x) and parameter 0. Let t = /(.y,, . . . , x„), a function of the n 
sample observations, be a statistic corresponding to 0. Let f{t) be the density 
function of the random variable t. If it is possible to determine two values, 
t, and ij, of the statistic t such that, for the parameter being estimated 

P[h<ff<ii] = y (6.1) 

where 7 is some fixed probability, the set of values between /i and t„ inclusive 
is called a confidence interval of 0. Thus, any value in the interval is considered 
a possible value of the parameter 0. The values /, and are called the 
confidence limits. The measure of probability associated with the confidence 
interval is called the confidence level or the confidence coefficient. References 
[3, 7, 9] give theoretical developments of confidence intervals. 

. Sometimes we refer to the confidence interval as the 1007 percent confi- 
dence interval. It should be noted that for a particular sample, limits are 
found which either do or do not include the true value of 0. Thus, when we 
say that 6 lies in a 1007 per cent confidence interval, we mean that this is 
true, on the average, 1007 per cent of the time. That is, if an experiment is 
repeated a large number of times and a 1007 per cent interval is computed 
each time, then in approximately 1007 per cent of the experiments, the 
limits include the true value of 0. 

In general, we say that a statistical hypothesis is an assumption concern- 
ing the density function of a random variable (or a set of random variables), 
and a test of hypothesis is a procedure for deciding whether to reserve 
judgment or reject the hypothesis. The definition of hypothesis is broad 
enough to cover many different types found in the study of statistics. In 
a restricted case, we suppose the form of the density function to be known, 
so that the statistical hypothesis specifies the value (or values) of each para- 
meter of the density function. That is, the hypothesis specifies some one 
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member (or subset of members) of a family of density functions In particular 
in the simplest cases of a statistical hypothesis only one parameter is assumed 
to ha^e a single \alue the form of the density function along with all other 
parameters being known Any hypothesis which assumes for a given density 
function that each parameter has a single fixed value is called a simple 
kypoihesis otherwise it is called a composite hypothesis Many hypotheses 
will be amply illustrated in the remaining sections of this book 

Associated with each hypothesis there are often several test procedures 
In this chapter we use that procedure which best illustrates the theorem 
under consideration To apply a test m the simplest ease select an estimator 
of the parameter of the hypothesis Then any rule which divides the set of 
all possible values of an estimator into two sets one being the region of 
rejection called the critical region and the other the region of indecision 
IS called a rest procedure for the statistical hypothesis If as a result of an 
experiment the value of an estimator falls in the critical region we reject 
the hypothesis otherwise we /all to reject the hypothesis When the hy 
pothesis is rejected we usually accept some alternative value (or values) 
of the parameter associated with the choice of the critical region The critical 
region as we shall see shortly may be composed of two or more disjo nt 
sets The probability that a value of the estimator falls in the critical region 
IS called the significance level of the test and is denoted by cr Since the main 
emphasis of the test procedure is to reject the hypothesis we generally refer 
to the statistical hypothesis under test as the null hypothesis To illustrate 
the meaning of the above terms consider the following example 

Example 6 1 Use the teak tree data of Exerme 2.10 (a) Find a point 
estimate of the population mean of teak trees (b) Find a 95 per cent con 
fidence interval for the true mean diameter (c) Test the hypothesis that 
the true mean diameter is 22 m when the significance level is five per cent 
Suppose the 1088 diameters represent a random sample of a large popu 
lation of teak trees all of the same age and grown in the same general area 
and environment of India Use the sample mean to make statements about 
the population mean Since the sample size is so large we may actually 
assume the sample mean lo be approximately normally distributed Ac 
cording to Example 3 10 the mean and variance of the sample are x — 21 69 
and s’ = 34 5156 Thus 21 69 is not only a point estimate but may be 
considered a best point estimate of the population mean p 

fn order to solve (b) and (c) we assume that the 1088 diameters represent 
a random sample from a single population and that the sample size is large 
enough so that the sampling distribution of means is closely approximated 
by the normal d stnbution Finally we assume that n = 1088 is large 
enough so that we may use 34 5156 as the population variance Thus the 
standard deviation of the sampling distribution of x that is the standard 
error of x is given by 
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34.5156 

1088 


0.1781 


It can be shown that 95 per cent confidence limits of /i are given by 
X - and X + where /.ojs is the upper 2.5 per cent value of 

the standard normal variate t. Thus, in our example the confidence limits are 


X ± U,a-x = 21.69 ± (1.960)(0.1781) ^ 21.69 ± 0.35 


or 21.34 and 22.04. 

Hence, the 95 per cent confidence interval for /i, symmetric about x, 
is given by 

21.34 </t< 22.04 ^ (6.2) 

The interval (6.2) is determined from a single sample of n values. Another 
random sample would usually lead to a different interval. That is, the 
limits vary from sample to sample. In fact, in a long series of random 
samples of size n drawn from a normal population with mean p. and variance 
0 -* = 34.5156, we would expect about 95 per cent of the resulting intervals 
to include the true fixed value of /x. It is for this reason that we call the 
typical interval 

X — r 055 ^ ^ ^ ■h ^.025 (^•3) 

• 

a 95 per cent confidence interval. Since the sampling distribution of x is 
symmetric and the limits are equidistant from a sample mean. Relation 
(6.3) is sometimes called a symmetric 95 per cent confidence interval. If 
Relation (6.3) were used to find intervals for each of 100 random samples 
of size 77, we would expect about 95 of the intervals to include /a, not know- 
ing which of about five would fail. 

When we use a 95 per cent confidence interval, we feel that our chance 
of covering the true mean is fairly good. If we wish to be more confident 
that the interval covers p, we should use a higher confidence level, say 98 
or 99 per cent. To be completely certain that the interval covers p, we could 
take as our interval all values from — oo to + oo. But such a statement does 
not really tell us anything about the mean of the population. In fact, the 
more confident we wish to be that an interval includes the true value of the 
parameter, the less we have to be sure of. 

To find a nonsymmetric 95 per cent confidence interval, we let 
^ ~ t.oto-j and X + t_o((rx be the lower and upper limits, respectively. 
Then, since /.m = 2.326 and tj^ = 1.751, the limits become Xi = 21.69 — 
(2.326)(0.1781) = 21.28 and JCj = 21.69 + (1.751)(0.1781) = 22.00, so the 
95 per cent confidence interval is 


21.28 ^ /A < 22.00 
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In this case it is clear that the symmetnc 95 per cent confidence interval is 
shorter than the nonsymmetric 95 per cent confidence interval It can be 
shown that this is always true fora fixed confidence level and a fixed sample 
size when f is symmetrically distributed about its mean 

In the test of a null hypothesis the choice of a critical region depends 
on what we expect the parameter under test to be if (he hypothesized value 
IS not true Lehmann (8J and others P, 7. 9. 15, 16] give some pnnetpfes which 
are useful in determining the most desirable critical region in a given ex 
penmental situation The intuitive method we give should generally be 
adequate for the relatively simple cases considered in this book 
Let us suppose in our example, (hat the true mean is equal to or less than 
22 Since the null hypothesis is fi = 22 in , the alternative values for (i are 
those less than 22 in That is, the so called a/rer/iaiiie hypothesis is that 
tx < 22 m We require a critical region, defined in terms of X, which allows 
us to accept the alternative hypothesis when the null hypothesis is rejected 
It seems reasonable to suppose that values of X somewhat less than 22 are 
the only values which would make us want to reject the null hypothesis in 
favor of the alternative hypothesis Thus, we define the critical region to 
be that set of values of i less than x, for which P[ j ^ = a when the 

null hypothesis is true That is X, is selected so that 

^ = 22. = 0 178) » 005 

Or • 

Since 

/»[r £ - 1 645J = 0 05 
It 15 clear that x, must satisfy the equation 


X, -22 ^ 
0 178” 


-1 645 


Thus, Xo = 22 — 1 645(0 178) = 21 61, and the cntical region is made up 
of all values of X for which jE ^ 21 61 Since our sample mean j = 21 69 
falls m the region of indecision rather than the critical region, we fail to 
reject the hypothesis that fs = 22 m This docs not imply that the mean is 
22 in , It simply means that we do not have enough evidence to say ih® 
true mean is less than 22 in 

Some points in connection with the above problem will be discussed 
in detail later on At this time we wish to examine some properties of sampl 
mg distributions of measures of central tendency derived from normal 
parent populations 
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6.2. SOME PROPERTIES OF LINEAR FUNCTIONS OF RANDOM VARIABLES 

If the parent population is normal, there is no better estimator of the 
population mean than the sample mean. In spite of this, other statistics 
which measure central tendency are sometimes used in place of the sample 
mean. Generally, these statistics are used because they are already available 
or are easy and relatively inexpensive to obtain. For example, the census 
bureau and many governmental agencies give most of their summary data 
in the form of medians. Mid-ranges may be computed in certain industrial 
processes almost as fast as the data are collected. Actually, if data are easy 
and relatively inexpensive to collect, it may be possible to place more reliance 
on medians or mid-ranges of samples of size sixty than on means of samples of 
size thirty. It may also be possible to make the statistical analysis in less time. 

As a matter of convenience for the reader, we now give in one group 
statements of seven important theorems relating to measures of central 
tendency and the normal distribution. Following the statements illustra- 
tions of applications are presented. 

Theorem 6.1. Let Xt be the ith observation in a random sample of size n 
with Xi drawn from a normal population with mean Pi and variance o-?. Let 
a linear combination I of these random variables he defined by 

1 = 2 

Isl 

where a, are real constants not all zero. Then the random variable 1 is normally 
distributed with mean 

n 

IL = 

1 = 1 - 

and variance 

<Tl = 2 

t~l 

Symbolically, the theorem may be stated as if Xi ~ n,0x„ o-f), jc,’s are inde- 
pendently distributed and 1 = ^^ afXt, some o, ^ 0; then 1 ~ = 2 tr? 

= 2 aWJ). ^ 

Theorem 6.2. If x is normally distributed with mean p and variance o-“, 
then the sampling distribution of the means of random samples of size n is 
normally distributed with mean p and variance tr-jn. Symbolically, if x n 
{p, ar^), then x ~ nipj = /i, o-i = trVn). 

Theorem 6.3. If x is normally distributed with mean p and variance a--, then 
the sampling distribution of the median x^ of random samples of size n ap- 
proaches the normal distribution with mean p and variance Tra-’^jln as n be- 
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comes large Symbolically, if then n,e^ 

= 7Tff’/2n) 0} n fcfeomeJ large 

. Theorem 64. If x represents the number of successes in n independent 
trials of an e\ent for which p is the probability of success In a single Inal, 
then 


— -JLZJIIL- 

^ ~ ‘JnpiS — j>) 

has a distribution that approaches the normal distribution with mean 0 and 
variance I as the number of trials becomes increasingly large. Symbolically, 
if X b(x\ n, p), then ) ~ n(ii = O.a* = 1) if n is large [For a fixed p, 
n IS considered sufficiently large if «/» ^ 5 and n(I — p) S 5 ] 

Theorem 6.5. TTie proportion of successes xjn. where x and n are defmeS. 
as in Theorem 6 4, is approximately normally distributed k rtA mean p and 
lartance p{\ - p)/n when n is sajpcienily large 

Tbeorem66. // J, and i, are normally and independently distributed with 
meanJ^lI andp, and \eriances al/n, andoiln,. respecinely, thends Si - St 
u normally distributed with mean S s (i, s }i, .. jx, and variance 



It IS understood that n, and n, denote the sample sizes Symbolicallyi »/ *i 
and S, determined from random samples of sizes n, and ni, respeclnely, and 
are independently distributed with S, - nfp,. <rf/n.) {i = 1. 2), then 

d= X, - n(^p^ = #1, - Pu<rl ® 

Theorem 6.7. Let x,(/ = 1.2) represent the number of successes m ni 
independent trials of an event for which p, is Ihe prohabifify of success vt 
a single trial Let p', — (xjn,) denote the sample proportions of success Then, 
when the number of trials n, and nt are sufficiently large, the difference 
^ = Pi - Pi w the sample proportions » opproximate/y normally i/tsfriiirfCi/ 
with mean Pt = p, ~ p, and variance 


= P»(> - Pi) 4. Pt(i - Pi) 

* "I nt 

Symbolically, tf Xi~^ b (j:,, ni.p,) (1 = 1, 2), and Xi and x, are independently 
distributed, then 


- — — Pi — Pi, vi = 


= Pid - P.) I Pt(l - Pj) ] 
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The proofs of Theorems 6.1 and 6.4 follow from properties of moment 
generating functions and are outlined in the next set of exercises. Theorems 
6.2, 6.5, 6.6, and 6.7'are actually corollaries of Theorems 6.1 and 6.4. For 
example, if we let a, = l/« in Theorem 6.1, we have Theorem 6.2. Further, 
if in Theorem 6.1 the two random variables are x, and Xo, cf, = 1, as = —1, 
and the variances are trl, and o-|„ then we have Theorem 6.6. The proof 
of Theorem 6.3 is the most difficult. 


} I 

6.3. APPLICATIONS 


We now give a variety of applications of the theorems of Sect. 6.2. From 
Theorem 6.2 it is clear that when the parent population is normal the distri- 
bution of sample means is normal, no matter how small the sample size. 

Example 6.2. A manufacturer of bolts specifies that the diameter of 
a certain type should be 2.5 cm. The standard process makes bolts with 
diameters approximately normally distributed with a standard deviation of 
0.1 cm. Adjustments are regularly required in order that the bolts be not 
too small or too large. Determine if the machine needs adjustment if a random 
sample of nine measurements has a mean diameter of :? = 2.62 cm. 

We wish to test the null hypothesis that \i = 2.5 cm, which is often 
denoted by 

Ho‘. II — 2.5 cm 

In order to protect against making bolts which are too small or too large, 
it is desirable that adjustments be made if the sample mean is small enough 
to be below a fixed small x, or large enough to be above a fixed large Xj. 
Thus, the problem is to determine x, and x,. We make the somewhat arbitrary 
decision to find values so that, when /i = 2.5 and (7 = 0.] 


or 


’ t< 


P[x < ^ i] = 0.025 and P[x > Xj] = 0.025 
X, — 2.5 


0.1 /V'9 


= 0.025 and P 


\ 2.5 

L -oWTJ 


= 0.025 


Since 

P[t< -1.960] = 0.025 and 

we have 

X - 2.5 


0.1/V9 “ - 

or 

Xi = 2.435 and 


So the critical region is made up of all those values of x for which 
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The values ^ , and ;c, are called eriiieal points and represent boundary points 
between the region of indecision and ibe critical regions Since the sample 
mean ^ = 2 62 is greater than 2 S65, we reject the hypothesis that thepopu 
lalion mean is 2 5 cm That is we accept the alternate hypothesis that the 
mean is different from 2 i cm Actually, in this case, we would say that the 
mean is larger than 2 5 cm realiaing that there is a very small {less than 
0025) chance of being wrong Thus on the basis of the sample the ma 
chine needs to be adjusted to make smaller bolts 

Above we defined the critical region in terms of St and S, and used the 
sample mean to make a decision However, we could reach a conclusion 
just as well if the values j?„ S, and sample mean were expressed in terms of 
standard normal deviates In this case we find directly from the table 
/i = — I 960 and t, = I 960 so that the critical region K composed of aD 
those values of / for which t < ~ 1 96 or r > I 96 Further, from the 
sample mean we find 


262 - 25 (012)3 .. 

’ “ oT = 

Title 

Probabilioes for ibe Bmomial Distribution 
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Since 3.6 is greater than = 1.96, we reject Ho and make the same con- 
clusion as before. Since there is usually less computation this way, we 
usually solve problems in terms of standard values. 

E.xample 6.3. Graph particular binomial distributions in the form of 
histograms and compare with the corresponding normal approximations. 

From Table I in Handbook of Probability and Statistics with Tables by 
Burington and May we obtained probabilities for the binomial distribution 
which are shown in columns 2,3,4, and 5 of Table 6.1. Using these values, 
we construct the histograms shown in Fig. 6.1. Note that corresponding 
to each discrete value Xj for the binomial density functions we have a unit 
interval from x, — 4- to x, + ^ for the histogram. The height of each rectangle 
in the histogram is the same as the density function value. Hence, summing 
any subset of density function values f{n) is the same numerically as sum- 
ming the areas of corresponding rectangles. In each case we obtain proba- 
bilities. •* 

A normal curve obviously would not fit Figs. 6.1a, b, and c well. How- 
ever, as shown in Fig. 6.2, a normal curve does fit Fig. 6. Id reasonably well. 
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The nor mal dis tribution has mean n a np b i and standard deviation 
a a A/np(l - p) a 2 19 The area of the histogram and the area under the 
normal curve are both 1 Further for the histogram the area above the 
interval from r, - ^ to jr, + J « /fx,). and the corresponding area for 
the normal distribution is 


X i.*(irti 

^ n(x. tip, npq) ax 

Since the normal curve reaches from ~ co to + «» it is customary to let 
the interval for x s= 0 reach from - oo to ^ and the interval for x = " 
reach from n - ^ to +oo The last column of Table 6 1 shows how good 
the normal approximation is for each value of x for the density function 
(d) (The values were found by using the methods of Example 3 10) 
Example ti 4 The occurrence of a one or two in an honest loss of a 
well balanced die is considered a success Use the normal distribution to 
find the approximate probability of being successful (a) in more than 15 of 
50 tosses and (b) exactly 15 tunes in 50 tosses (c) Check (a) and (b) using 
either Pearson's {11} or Romig’s 1131 binomial tables 

In (a) and (b) we find the probabilities for the discrete binomial function 
using the continuous normal approximation as described m Example 6 3 
Thus, in order to find b{\6) + + 6(50), we first determine the standard 

normal t corresponding to jr — ^ = J5 5, that « ' 


«/np(l - p) 


-035 


( 64 ) 


Hence 
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^(16) + . . . + 6(50) = P[x = 16, . . . , 50; binomial with « = 50, p = ^] 

= P[;i: >15.5; normal with /i = ^, o-“ = -J-p] 

= P{t > -0,.35; normal with /x = 0, o-= = 1] 

= 0.6368 

For (b), using the same method, we obtain 

6(15; n = 50,;> = i) = P[14.5 < x < 15.5 ; normal with /i = 

= P[-0.65 < t < -0.35 ; normal with /i = 0, o-® = 1] 
= P[t > -0.65] - P[t ^ -0.35] 

= 0.7422 - 0.6368 
= 0.1054 

Using Romig’s tables, we must interpolate. Since 

P[x = 16, . . . , 50; binomial with n = 50, p = 0.34] = 0.6679 
and 

P[x = 16, . . . , 50; binomial with ti = 50, p = 0.33] = 0.6120 
we have, by linear interpolation ■ r 

P[x = 16, ... , 50; binomial with n = 50, p — = 0.6120 + 4^0.0559) 

= 0.6306 

Further, by linear interpolation 

P[x = 15; binomial with n = 50, p = .^] = 0.1103 — ^(0.1103 — 0.1020) 

= 0.1075 

From this we see that the approximations agree to two significant figures. 
This is quite adequate for many purposes. 

Example 6.5. Find the shortest 90 per cent confidence interval for the 
difference between unknown population means ju., and pj if it is known that 
the two populations are normal with variances cr? = 70 and crl = 180, 
respectively. Suppose we find that the means of random samples of sizes 
= 10 and Ho = 20 are x, = 23 and Xj = 28. (The reader can furnish 
his own interpretation for such a problem. For example, a manufacturer 
may wish to compare the means of measurements made on articles produced 
by the oldest and newest machines in the plant; a scientist may wish to make 
such a comparison using measurements obtained by two different methods; 
an engineer may wish to test tensile strengths of two different metals or 
concretes, etc.) 
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Theorem 6 6 is used to find symnwtnc 90 per cent limits, since they give 
the shortest intenal Since the standani normal l is given by 



the limts for Ml - arc given by 

*,-*,±<.( 7 ^+^) (“) 

ttbere / « is the upper five per cent value obtained from tables of the normal 
distribution In particular, the limits are 

23 - 28 + 1 645(v'|i -+ V^) 
or 

-IIS8 and 158 

It appears thatti, is very likely to be less than Mc^tit Mi could be equal to or 
slightly greater thanMi 

Example 6 6 We may use Theorem 6 4 and the number of successes 
in a random sample of size n to find approximate confidence liinits for 
p when n is large (a) Find approximate 100 (1 - or) per cent eonfidence 
limits for p when n is large (b) Use the results obtained m (a) to determine 
the 95 per cent confidence limits when n = ICO if, from a random sample 
the total number of successes is x a 20 

Again It IS left for the reader to supply his own interpretations For 
example, an inspector may wish to know the minimum and maximum 
proportion of defectives to expect if a random sample has v/n defeciises, 
a biologist may wish to determine the survival rate of a certain type of insect 
under a given set of conditions, one sampling public opinion may use this 
method to determine what proportions of a population can be expected 
to vote a certain way or select a certain article, etc It should be understood 
that in each case the experimenter is at liberty to set the degree of confidence 
with which he wishes to work 

The reader, no doubt, has already noticed, m the special case whem 
6 = M and the statistic is a mean X which is normally distributed that 
the statement 

= y = 1 - a 

becomes 

Ftx- + I - a 

where r. , is the standard normal value defined by F(r 2; *<■ I = 
Further, it should be clear that 
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p[-ta/2 < = 1 - a 

(6.8) 

is equivalent to Eq. (6.7), since 



X — ta/2‘0'x < ft < ^ + fa/2'^X 


implies that 

and ^ ^ 

O’! 


implies that 

^X- 

lall S- Zis 'a/2 


p and a-j being fixed parameters and x being a sample value of the random 
variable x. 

Now in case Theorem 6.4 applies, x, the number of successes in n inde- 
pendent trials, may replace x in Eq. (6.8), so that we may write 


- V^p(l - p) - ^ “ 

(6.9) 

For a given value of x in a sample of size n with fixed tan = to> 
a range of values of p which satisfy the inequality 

there is 


^ x-np 


We can find the limits of the range of values of p by solving the two 
equations 


X — np 
/np{\ - p) 


= ±t a 


( 6 . 10 ) 


(Actually, a correction for continuity should be used. That is, we should 
write 


= -/n 


/np(\ - p) 


= +t. 


But this leads to further complications in solving for p. Anyway, for large 
values of n the correction makes very little difference in the limits. For 
small values of n the correction probably should be taken into account.) 
On squaring both sides, the two equations in (6.10) become the same. 
Then the resulting quadratic in p is 


or 


{x - np)- - /5[«p(l - p)] 
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(n’ + hjDp — (2rt* + niDp + x* = 0 

The (I - «) confidence limits of p obtained by solving this equation by 
the quadratic formula are 

_ 2nx + ntl± + 4iiy + 

^ ~ 2(n‘ + rtfj) ' 

To solve (b) we let » = 100, je = 20, and feu — /• = 1 96 in Eq (6 11) 
This gives the limits 

p^=OI33 and p,=:0289 

Formula (6 11) may be used to obtain good approximate limits to the true 
proportion whenever the sample sire is sufficiently large However, the 
calctrfattons involved are lengthy and unnecessary, in most cases, since 
charts have already been prcparinl Table III may be used to find limits 
for p for confidence coeffiaents 095 and 099 In the rare case when ether 
confidence coefficients are required Formula (6 11) may be used In order 
to find the 95 per cent confidence limits of p for (b), use the chart in Table 
III Locate ^ * 0 20 on the bottom honzoma! line, and follow the 
vertical line above 0 20 until ii intersects the bottom curved line for n = lOO 
With a straightedge falling on this point of intersection and placed m 
a horizontal position, locate a point where the straightedge intersects the 
first vertical line, the scale for p We read the value /»i » 0 125 In a similar 
way we find p* = 0 290 by using the top curved line for n = 100 These 
values compare favorably, to two significant figures, with those obtained 
by the formula Since the limits, at best, are only approximations, we prob- 
ably should not use more than two significant figures in any event 
Example 6.7 It is informative to use an approximate method which k 
less accurate than either of the above methods, but which has the advantage 
of being short Replacing 5t in Eq (6 7) by p’ = x/n, the proportion of 
successes m n independent trials, and remembering (see Theorem 6 5) that 
— p and = p(l — p)ln for sample proportions, we obtain the 
following inequality with approximate limits 

Allowing for the correction for continuity, we may write 

But the population proportion p which is unknown appears on both sides 
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of the inequalities. However, if n is large enough, we may replace p with 
p> _ the sample joint estimate of the true proportion, to obtain 




''(1 - P') 
n 


( 6 . 12 ) 


Using Eq. (6.12) and the sample values of Example 6.6(b), we find the limits 


P ± 


2n 


+ t. 


yz 


(1 ~ p') 
n 


= 0.2 ± 


0.005 + 1.96 


/ ( 0 . 2 )( 0 . 8 ) ~ 

V 100 J 


= 0.200 ± 0.083 


or 

Pi =0.117 and p^ = 0.283 

Ignoring the correction l/2n = 0.005, we find p, = 0.122 and p? = 0.278. 
Thus, to two decimal places, we obtain limits 0.12 and 0.28 in both cases, 
and they are both lower than the more exact limits of 0.13 and 0.29. 

Example 6.8. In a particular city, in response to a certain question 100 
of 400 women answered “yes,” and 150 of 500 men answered “yes.” Use 
these data to establish 90 per cent confidence limits for the true difference 
in the proportion of women and men responding “yes.” Assume that the 
samples were randomly drawn. 

If it is assumed that Theorem 6.7 applies, approximate confidence limits 
of Pi - ps are given by 

p',-p',±i,JeSE£r^m^ 

V w, n., 

where the subscript i denotes women and the subscript 2 men. Assuming 
the samples to be large enough so that the sample proportions may be 
used as good point estimates of the true proportions, we introduce a further 
approximation in the confidence limits by using 


Pl - P 2 ± t 05 



I P'i{l - Pi) 
n„ 


(6.13) 


Substituting the given values in Eq. (6.13) along with ?os= 1.645 gives 
the limits —0.099 and —0.001. That is, the 90 per cent confidence interval is 


-0.099 < p, - Pj ^ -0.001 

Thus, we conclude that a larger proportion of men in the city will vote 
“yes” on the question, understanding that we may be wrong (but it is not 
likely). 

It is informative to note at this time that the standard normal value 
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t . , used to obtain the symmemc (I - n) 100 per cent confidence intenal 
of IS the same \alue used to maVe the corresponding symmetric a level 
test of the null hypothesis ft, = k, vAhere A is a real number and s is any 
statistic which is normally distributed with mean jt, and variance e! 
However, the confidence limns and the corresponding critical points in 
terms of the statistic s are not the same Actually, they would be the same 
only in very rare cases 

Consider Example 6 2 For the five per cent level test of the null hypothe- 
sis /i = 25cm the Symmetric 2 5 per cent critical points are (, = -196 
and ft = I 96 in terms of the standard statistic I, and x, = 2435 and 
J?t = 2 565 in terms of the sample statistic ^ But the confidence limits ate 

J ± = J62± I96(^) = 3 620 ±0065 

or 

i, = 2 555 and = 2 685 

which arc quite different from the critical points 2 434 and 2 565 

For the information given in Example 6 2 the 95 per cent confidenw 
interval for the diameter of the average bolt manufactured is 

2 555 i ^ 2 685 

In this case, it might be observed that the hypoihesiaed mean fi = 25effl 
does not fall in (he confidence interval This agrees with our earlier con* 
clusioi^ where we rejected the null hypothesis and accepted the fact that 
thfr'ftiean is larger than 2 5 cm As a matter of fact, the confidence interval 
indicates whit values (he parameter is likely to take on Thus, in this ease 
at least, we sec that a statement at the five per cent level can be made about 
the null hypothes's if we have the 95 per cent confidence interval The reader 
should be cautioned on this point lest he ihmk this is always the case 
Actually, in most cases the symmefne confidence interval is used since it 
IS shortest, whereas the critical region, depending on the alternative hy 
pothesis, IS often not symmetric, in whteft case the confidence limits cannot 
be used to reach a decision regarding the null hypothesis Thus we may 
state the following rule it. in problems in which Theorems 6 1 through 
6 7 apply, the critical regiob is divided mto two parts, each with equal 
probability and reaching to infinity, (ben the symmetric confidence interval 
can be used to indicate a decision on the null hypothesis = k, the alter- 
native hypothesis being ji A provided the same statistic is appropriate 
in each case \ 

This brings us to ’he secondVpoint In some cases, such as in Example 
6 8, different statistics are used iil determining confidence limits and critical 
points In Example 6 8 the statift'C used to find the confidence limits was 
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{p\ - Pi) - (Pl - Pi) 

I AT-P2) 

«! Pi 


(6.14) 


However, in order to test the corresponding null hypothesis pi — pz — 0 
(or pi = Pi), symmetric critical regions being used, the most appropriate 
statistic is 


(p'l - pQ - 0 




(6.15) 


where 


> _ X, + Xg 
^ «i + n. 


For under the assumption of the null hypothesis Pi = Pi, we may let p be 
the common true proportion of successes for both populations, and use 
the pooled sample to obtain a single estimate p' of the true proportion. 
Since more observations go into finding the estimate p', we would expect 
p' to be better than either p[ or p'^ in the sense that the sampling distribution 
of p' has less spread than either of the sampling distributions of p'l or p^. 

Example 6.9. Use the data of Example 6.8 to test at the ten per cent 
level the null hypothesis /?, = p^ against the alternative hypothesis Pi ^ ps. 

If it is assumed that Theorem 6.7 applies, the critical region, in terms of 
the standard normal variate, is the set of values for which t < —1.645 or 
t > 1.645. The standard normal deviate for our samples is given by 


1 0 0 I 5 0 

— 7 222 222 = —1.66 

V -Tinr wiT tTinr + Tcnr/ 

Since the sample value of the statistic falls in the negative critical region, 
we reject the null hypothesis and conclude that the true mean proportion 
of women answering “yes” will be less than for men. The student should 
satisfy himself that the use of Eq. (6.14) could lead to a different conclusion. 

Example 6.10. Compare the mean and median for small sample sizes. 

We indicated earlier that the mean is generally preferred to the median 
for a given sample size. The reason for this is indicated in Theorem 6.3, 
since so many of the populations encountered in practice are approximately 
normally distributed. According to the theorem 


when n is large. That is, the standard error of the tyjedian is tt 12 =1.57 
times as large as the standard error of the mean for large samples. 


I 
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For small n it can be shown 15, 10) that the relationship <r’. > still 
holds except in the case where n = 2 Table 6 2 shows how the variance 
of the median changes for small samples drawn from a normal population 
Thus, when estimating the population mean n from a random sample of 
size n > 2, we prefer the sample mean to the sample median, because the 
sample mean deviates Jess from on the average 

Table 61 

Variance of ihe Median when = 1 and Samples are Drawn 
from a Normal Population 

" I ^ ^ ^ ^ 

I 100 IM n9 Tm r» MT 139 

There is another sense tn which we may relate the sample mean and 
sample median Suppose we reauirc that the variance of the statistic (mean 
or median) which estimates the population mean not to exceed a fixed value 
ffl Then the sample sizes n and n' of the mean and median, respectively, 
must be large enough so that 

” iS ^ ^ ^ ^ 

Thus, when the standard errors of the mean and median equal ai 


or 


= I 57 n for large n 


(616) 


Thus, when one is sampling from a norma! population, the sampling du- 
tribution of the median of samples of size (>r/2) n is the same as the sampling 
distribution of the mean of samples of size n Hence an inference statement 
based upon a sample mean of size n can be expected to be just as dependable 
as one based upon a sample median of size {ia)2)n In particular, if the popu 
lation variance is known, the length of a 100 (1 - a) per cent confidence 
interval for the population mean as determined from the mean of a sample 
of size 40 IS the same as the length of a 100 (1 — a) per cent confidence 
interval determined from the median of a sample of size (ff/2)(40) = 63 The 
interval determined from the median of a sample larger than 63 would be 
shorter 

Example 6 11. In industrial work and in other areas, the statistical 
control of quality in a continuing process has proven to be very important 
Actual control situations are van^ and are studied with the aid of many 
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kinds of statistics. We consider a special case involving sample means 
computed from random samples drawn from a normal population with 
mean /i and variance o-=. 

In a hypothetical investigation, suppose that one determination of raw 
pulp viscosity is made on each work day (hour, or any constant period of 
time) at roughly the same time for 30 weeks. Further, suppose the mean 
raw pulp viscosity per week for each of the 30 weeks is as follows. 


Table 6.3 


Week 

Mean 

Week 

Mean 

Week , 

Mean 

1 

140 

11 

135 

21 

135 

2 

135 

12 

155 

22 

125 

3 

155 

13 

135 

23 


4 

145 

14 

140 

24 


5 

135 

15 

140 

25 


6 

135 

16 

140 

' 26 

195 

7 

160 

17 

145 

27 

190 

8 

140 

18 

145 

28 


9 

145 

19 

150 

29 

190 


125 

20 

155 

30 i 

165 


(a) Prepare a control chart for the 30 means, (b) Discuss the use of the 
control chart in (a), (c) Give a short discussion of the term quality control. 

In constructing the control chart in Fig. 6.3, we assume the observations 
to be randomly drawn from a normal population with mean p = 145 
and variance = 256. Therefore, the sample means x are normally distri- 
buted with mean pj. == 145 and variance al = = 51.2. The sample 

means were based on samples of size five, there being five work days. Each 


X 

ZOO - 
180 - 

160 ^ ^ ^ ~ 

• • • 

^ • • 

140 ~ • • • « « • 

• • • • • • 

120 r— = = 

1- 1 I I I 1 .1 I 1 I 1 .1 I I I I I I I I t I I 

3 6 9 12 15 18 21 24 27 30 

Week 


Fig. 6.3 Control Chart for Means 
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point on the control chan was located by letting the week number be the 
abscissa and the corresponding mean be the ordinate 

The three lines parallel to the axis of abscissas are for means 
- 3<T/^/ 5 = 123 5. /I = 145, and fi -t- 3<t[^ $ =* 166 5 The top line 
IS called the upper coniral limit, the bottom line the /oner control hmt 
and fi = 145 the center tine Under the assumptions, the probability of 
a sample mean’s falling above the line S = 166 5 is 000135, and the prob- 
ability of a sample rncan’s faliing below the line ^ = 123 5 is 0WI35 
Thus, so long as the assumptions hold, the sample means vary randomly 
about the center line s= |i = 45 with a probability of 0 00270 of falling 
outside the tv.o limits Since the chance of a sample mean’s falling outside 
the limits is so small, we say that the process is out of control whenever 
this happens ' 

In our example the process went out of control during the twenty SHih 
week and actually stayed oat of control for the next three weeks With such 
strong evidence that all is not right, the production process should be 
investigated Thus such things as raw material, machines, ways in which 
the machines are adjusted and operated, and various other aspects of the 
process should be examined and corrected when necessary 

The control limits m this particular example are symmetrical about 
the population mean and located three standard errors away Sometimes 
the limits may not be symmetrically placed In fact, only one limit may be 
required For example, in many tensile-strength investigations, say, we 
require only a lower control limit since the object is to cJimmate production 
of weak material 

Control limits may be located any number of units on either side of 
the population mean say However it is customary to locate them in such 
a way that whenever a point falls outside of the control region the m 
vestigator (process engineer or trouble shooter) can expect to look for 
serious trouble, and not waste time looking for minor or nonexistent trouble 
spots Actually, control limits should be placed so as to strike an economical 
balance between looking for trouble that does no! exist and failing to 
look for trouble that does exist 

Since the population mean and variance are not often known, the center 
and control limit lines are usually located by applying estimates obtained 
from several of the first samples This involves problems which can be 
handled better in later chapters 

Statistical control of quality has widespread applications A large 
body of methods m quality control have developed since W A Shewharl 
first introduced the techniques m industry m 1926 The control chart, even 
though It can be used with almost any statistic, represents only one of the 
methods For the student interested in reading more on the use of control 
charts and other methods in statistical control of quality, there are many 
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books and articles available. A few references [1,2,4,12,14] are given 
at the end of this chapter. 

In almost any kind of repeated-measures investigation we expect a 
certain amount of variation. We think of part of the variation as resulting 
from chance (or random) causes and part as resulting from assignable (or 
systematic or controllable) causes. If, in a process, only random causes 
account for the variation, we say that the process is in statistical control. 
Otherwise, the process is out of statistical control, and our job is to measure 
and correct (or adjust) the assignable causes. It is doubtful that a process 
can ever be brought in complete control, but it can be brought close enough 
for practical purposes. 

There are many quality characteristics which are measurable, for example, 
the tensile strength of wire, life of a light bulb, number of defective units 
in a lot, and number of rough spots on a surface. Terms like variable and 
attribute are often used to distinguish between the continuous and discrete 
cases in statistical quality control. (The above concepts of quality control, 
along with others, are discussed in other chapters and in the exercises of 
this chapter.) 

The reader should recognize that the control chart provides a graphic 
way of repeatedly testing the same hypothesis relative to the quality of 
a product. The purpose of the test is to determine repeatedly, on the basis 
of a small sample, whether a desired standard of quality is actually being 
met. The hypothesis subject to test is that, the “process is in control.” (In 
Example 6.11, the statement “process is in control” means “a mean of 
samples of size five is not significantly different from 145.”) If for any 
sample one fails to reject the hypothesis, it is presumed that the process is 
operating satisfactorily. If for any sample the hypothesis is rejected, it is 
presumed that the process is not operating satisfactorily, that is, is out of 
control, and a search is made to detect the source of assignable variation. 

6.4. ERROR TYPES AND OPERATING CHARACTERISTIC CURVES 

Once the null and alternative hypotheses, the sample size, and the 
significant level of the test are decided on, a sample (usually random) is 
selected from a population under consideration, and the sample is used to 
make a decision concerning the null hypothesis. Unfortunately, in reaching 
a decision, we cannot be sure that a mistake will not be made. In fact, there 
are two types of mistakes which are possible. It may happen that the null 
hypothesis is true and we conclude that it is false. In this case, we say the 
mistake is a type 1 error. The probability of making this error is sometimes 
called the size of the type 1 error and is the same as the significance level of 
the test. It may happen that the null hypothesis is false and we fail to reject 
it. This mistake is called a type 2 error. The frequency with which type 1 
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and type 2 errors are made is very important to the experimenter We shall 
see that the frequency of errors can be controlled to some extent 

The meaning of the terms just introduced, as well as others already 
discussed in hypothesis testing, can best be brought out by an illustration 

Example 6.12 We discuss the average breaking strength of a certain 
type of yarn as measured in ounces, assuming the population of breaking 
strength of yarn, x, manufactured by Bob to be normally distributed with 
variance cr' = 100 Manufacturer Bob contends that the mean fi is at least 
1000 oz Thus, our null hypothesis tf, is that /i^IOOO, that is, 
Ht ft > 1000 The distributer. Joe, however, wants some kind of statistical 
verification of this hypothesized value of the mean breaking strength against 
the alternative hypothesis that p < 1000, that is //, < 1000 Suppose 

that Joe and Bob agree on selecting at random 25 pieces of yarn and 
that the breaking strength of each piece be tested by some standard procedure 
It should be observed that Joe would have no further worry if x were greater 
than 1000 oz However, wuh an Ji below 1000 Joe would wonder how often 
he could expect to get such a small sample mean due to chance causes alone 
He may doubt the claim of Bob In any case, it is decided that the statistical 
test should be carried out This requires that Joe and Bob agree on the 
maximum size of the type I error, that is. that they agree on how often 
they are willing to reject //« on the basts of the results of one sample when 
fit IS actually true 

In order to see how they might arrive at a mutually satisfactory answer, 
we shall consider the sampling distribution for samples of sue 25 
From previous theorems we know that X is normally distributed with 
mean = fi = 1000 and variance <r', w ® =» 4 under the as« 

sumptions that the null hupothesis is true We have used the most meaningful 
II of //j, that IS, ft = 1000 The graph of this distribution is shown in 
Fig 6 4 

The distributor will maintain that if X is less than some fixed value, say, 

(see Fig 6 4), the hypothesis must be rejected and the manufacturing 
process improved On the other hand, the manufacturer will propose a smaller 
value of X, say, X, However, the manufacturer and distributor must agree 



Fig 6 4 Normal Density 
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on a value, say, Xq, before proceeding with the test. The value x„ is the critical 
point, and it will usually lie between x, and Xj. The region to the left of the 
critical point is the critical region, wherein we reject the hypothesis, and the 
area above the critical region and below the density curve which we denote 
by a is the significance level of the test. This value a gives the probability 
of rejecting the null hypothesis when it is true; that is, it gives the proba- 
bility of making the type 1 error. The student should note that once Xo is 
agreed on, a can be found, or if agreement is reached on the size of a, then 
Xo can be computed. The usual value of a (also called the “producer’s 
risk”) is five per cent, although, in general, this is a matter of compromise. 
The region to the right of the critical point is the region of indecision, or 
region of nonrejection, and if our sample x falls in this region, we fail to 
reject the hypothesis. Actually, if no changes are made in the manufacturing 
process, we act as though we accept the hypothesis tentatively. It should 
be noted that even though the statistician may be able to reserve judgment 
when the sample mean falls in the noncritical region, the man who is re- 
sponsible can take no such stand. 

Returning to our original problem, suppose it is decided that a: = 5 
per cent. Then, since x ~ /7(1000, 4) under the null hypothesis, we can use 
Table II to find our critical point, which turns out to be 996.71, since 
f = —1.645 = (xo — 1000)/2 implies Xo = —3.290 + 1000. The critical 
region is that region for which x < 996,7. Since the mean of the randofn 
sample of 25 pieces of yarn actually turned out to be 990 oz, and since^ 
990 < 996.7 = Xo, we reject the hypothesis that the average breaikingj 
strength of yarn made by Bob is equal to or greater than 1000 oz and/con-j 
elude that the average breaking strength is less than 1000 oz, understariclin| 
that there is a chance of making an error. Actually, since 990 is five standard 
deviations away from the mean, there is less than a 0.0000003 c^i^tnee of 
being wrong. . Vi 

We make a few observations at this time. First, it should be observe^ 
that the sanjpling distribution of x is a normal distribution. For a large sample^ 
size, say, n > 30, the mean x can be expected to be normally distributed 
regardless of the distribution of .v, if it is assumed that x has finite variance. 
For small samples, we must assume that x is normally distributed before 
the above argument holds. Second, ivc used the particular value ju. = 1000 
of the null hypothesis to draw the curve in Fig. 6.4 and to locate the critical 
point and critical region for the test. The reason for this seems obvious 
For example, suppose we had used the value /i = 1010 to draw the curve- 
then the critical point would have been 1006.71, and the critical region 
would have been all those x values for which x < 1006.71 . Thus, if a particular 
sample had a mean of 1000, we would find ourselves in a position of reject- 
ing the hypothesis that > 1000, which is ridiculous. Third, the critical 
point would be different for a different sample size n or a different value of 
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a or a different allernan\e hypolhests Clearly, these three things should 
be decided on before proceeding with the test Fourth, the location of the 
critical region depends on the nature of the distribution as v-ell as the state 
meats of the hypotheses For example, if the hypothesis had been that 
(j. JOOO, then only very lirge values of the sample mean x would make 
the experimenter want to reject the hypothesis Thus, the critical region 
would be made up of all those means which are greater than some critical 
value Xt which is larger than 1000, that is, X > 1000 If the null 

hypothesis had been that /t = 1000. with the alternative hypothesis 
H ^ 1000, the critical region would be made up of two parts, as has been 
illustrated in Examples 6 2 and 6 9 For a symmetric distribution the two 
parts are located so that a/2 of the area of the density function is above 
:ach part, and these regions are located at the ends of the scale of measure- 
ments Such tests are called /no tailed tests If the critical region is located 
at one end of the scale of measurements the test is called a one-tailed test 
In (he above illustration of the test of (he hypothesis that the population 
mean of the breaking strength of yarn manufactured by Bob is equal to 
or greater than 1000 oz we have proceeded in a definite way We now 
set down a general procedure for testing hypotheses and in other examples 
iR this and later chapters we follow this procedure 


I State the null and the alternative hypotheses 
U 2 Stateithe assumptions and specify n and a (Ideally a and the 
probibility of making the type 2 error should be specified ) 

3 Specify the statistic to be tested 

Dettfrmine the critical region from the required table 
f Compute the statistic from the sample 

6 ^^rite the conclusion in terms of the statement of the hypothesis 
Sometimes we fail to reject the null hypothesis when it is not true In 
such a case we make a type 2 error Denoting the probability of such an 
error by /9, we determine its value for specified values of the alternative 
hypothesis by 


^ ~ /’Ifjiling to reject y/, is true] (6 17) 

The right hand side of Eq (6 17) is read as ‘the probability of failing to 
reject the null hypothesis, given that the alternative hypothesis is true* 
Example 6 13 In the yarn experiment assume that the true mean is 
H = 998 02 Thus the probability of making a type 2 error is given in Fig 
6 5 by the area under the curve with mean ft = 998 and above the region of 
nonrcjection, that set of values of r for which t > 996 7 In particular, we 
have 


/9 = F[failing to reject = 1000, normal with ft — 998 and oj = 4] 
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1 1 / x-m y 

1 -2^—1 Jx 


^ r 

*496.7 2v^?r 

= ( when t — 

J-.SS WJ-Tt 

= 1 _ F(_0.65) = 1 - 0.26 = 0.74 


jc - 998 



Since the true value of /i is not usually known, we graph /9 = /9(p.) for 
different values of fi, and this graph is called the operating characteristic 
curve (abbreviated O. C. curve). It gives us at a glance the probability of 
making a type 2 error for any alternate true value of p. 

The general formula for computing ^ when cr= is known is derived below 
for the null hypothesis that p > Po, where the alternative hypothesis is 
Ha'. IX < Po- We have y 

^(p) = ^[failing to reject p„; p is true] 
or 

^(p) = f e dx 

V Ztt O' •'lo V # 

where = po + If we let 


t = ^ ~ ^ 

<7 

n 

Eq. (6.18) becomes 

(6.19) 

where 

^0 = (^0 - p)^ =.(p„ + -ix]-c/jL=,t + 

<r \ V n / a- ^ 

\/~ n 


Thus, we may write 
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or 

«rt = i + 

' nj n ' 

Example 6.14 In Examples 610 and 611 we have <r = 10, n = 25, 
Hi = 1000, /i = 0 05, and / ^ - 1 645 Thus Eq {6 20) becomes 


/?(,<) = I - - I 645) (6 21) 


In order lo plot the curve Sin) shown in Fig 6 6. we determine the follow- 
ing points from Eq (6 21) 


/9(999) * 1 - 

ffm)j= I - 
/9(997)V 1 - 
y8(996) ^ 1 - 
^(995) * I - 



f(0 5 - 1 645) * I - r(- 1 145) =s I - 0 126 = 0 874 

F(l _ I 645) = 1 - F(-0 645) s 1 - 0 259 = 0 741 

f{-0 145) « 0 558 

f(0 355) = 0361 

^(0855; = 0196 

F(1 335) = 0088 

F(1 855) = 0032 



Suppose X — n(^.er’) and <7-* is known, then the general formula for 
computing /9 for the hypothesis that ii<Ht against the alternative hy- 
pothesis Ha > *io IS given by 
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\ vT . 


( 6 . 22 ) 


If the null hypothesis is /x = /Xo and the alternative hypothesis is Mo, 
then for symmetric critical regions 




(6.23) 


In theoretical work the function p(/x) given by 

pQx) - 1 - fiiii) (6.24) 

called the power function of the test of a hypothesis involving jx, is very 
useful in comparing tests of means. It is clear that the power function of 
a test gives the probability of rejecting the null hypothesis when the null 
hypothesis is false; that is 

P(m) = P[reject is true] (6.25) 

or 

p(p) = P[reject Hg-yHo is false] 

Thus, the power is greatest when the probability of an error of the second J 
kind is smallest. A power function has an advantage over the operating 
characteristic function in that it is associated with the critical region jujft 
as a is. That is § 

pip) = 1 - P[sample mean falling in noncritical region; Ha true] a 
= P[sample mean falling in critical region; Ha true] M 

and H 

a = P[sample mean falling in critical region; Hg true] » 

For the same sample size and the same significance level, test one is sa® 
to be more powerful than test two, at a specified value of /x, if tke powm 
of test one is larger than test two, or if the probability of committing a^ 
type 2 error is smaller for test one. In general, we prefer that test which is '' 
more powerful. 


6.5. EXERCISES 

6.1. Of the first 300 babies born in November in a certain city, 180 were boys 
and 120 girls. If it is assumed that these 300 babies represent a random 
sample from a population, estimate by a 95 per cent confidence interval 
the proportion of male births in the population. 
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6^. Of two machines used to test hardness in a laboratory, it is desired to 
determine whether machines A and B are consistent with each other 
The variance of the readings from each machine is known to be 0 16 
A standard block was used, and eight determinations made on each 
machine showed sample means of jf, = 66 7 and St — 67 2 Test the 
appropriate hypothesis at the five per cent level, using the general test 
procedure for testing hypotheses outlined in six steps on p 218 
Two sets of 100 students each were taught to read by two different 
methods, respectively After instruction was over, a reading test gave the 
following sample results i = 73 4. p = 70 3, 8, and j, = 10 

Assume that the samples are large enough so that s, and /, may be used 
m place of cr, and (a) Determine a 90 per cent confidence interval 
for (i, — ii, (b) Determine how large an equal'Size sample from each 
group should have been used if it is desired to estimate to within 

one unit with a probability of 0 93 

6 4 In a poll taken among college students, 46 out of 200 fraternity men 
favored a certain proposition, and 51 out of 300 nonfraternily men 
favored it (a) Find a 95 per cent confidence interval for the difference 
in proportions favoring the proposition (b) Test the hypothesis that the 
true proportions favoring the proposition are equal 

6 5. Draw the operating characteristic curve for the test in Example 6 2 
Use the curve and graphic methods to find the probability of making 
the type 2 error when the true mean is 2 44. when it is 2 60 

66. The bacteria content of a food product must be less than 65 0 to be 
acceptable A sample of 16 cans from a lot of the product has a mean 
content of 65 4 Long experience indicates that the standard deviation of 
the bacteria content is 04 (a) If the probability of false rejection of 
the Idl IS io be 0 05 should the lot be rejected on the basis of the sample 
evidence’ (b) Draw the operating characteristic curve for this test What 
IS the chance of nonrejection of a lot which has mean bacteria content 
of ^6 0^ (c) Draw on the same graph with (b) the operating characteristic 
curve for the above test when samples of size 64 are used m place of 
samples of size 16 Suppose we require that Che probability of the type 2 
error not be greater than 0 10 Find the smallest bacteria count for which 
a = 005, = 0 10, and n — 16. for which « = 005, /3 = 0 10, and 

n ~ (A Use this jnformalion to make a statement about the effect of 
sample size on the lest (d) Suppose the specification that the true mean 
bacteria content be 65 0 allows the true mean content to be as large as 
65 05 Without rejection of the lot For the five per cent level test that 
the mean bacteria content not be larger than 65 0, determine the sample 
size so that for a sample mean of i — 65 05 the probability of the type 2 
error is at most 0 10, that b, /S = 0 10 (The reader should rbserve here 
that we wish to find the sample size which insures that the sample mean 
not be larger than the maximum allowable specification when the sizes 
of the type 1 and type 2 errors arc fixed and the true mean actually does 
not exceed 65 0 ) 
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6.7. (a) Consider the test oi Ho', /x < /io against fi > ^o, where ‘s 
some constant. Let x denote a sample mean and let d = x — p-o- Find 
the smallest sample size n which guarantees that for a fixed d the proba- 
bilities of the type 1 and type 2 errors will not exceed specific values 
of a and y(3, respectively. It is to be understood that the parent population 
is normally distributed with variance o-- and that the critical point is 
denoted by x^. [See Exercise 6.6(d) for a numerical approach to the 
problem.] (b) Use the expression derived in (a), the standard normal 
table and specified values of Z) = dja-, tx, and /3 to make a table of 
minimum sample sizes. Compute at least eight values, letting a = 0.05 
and 0.01, yS = 0.10 and 0.05, and D equal to values of interest to the 
student. 

6.8. (a) A lot of rolls of paper is acceptable for making bags for grocery 
stores if its mean breaking strength is not less than 40 lb. A random 
sample of 25 pieces of paper from the lot had a mean breaking strength 
of 39 lb. Long experience indicates that the standard deviation of the 
breaking strength is 2 lb. Should the lot be rejected if a: = 0.1 ? (b) 
Draw the operating characteristic curve for the test in (a). What is the 
chance of nonrejection of a lot which has true mean breaking strength 
of 39 lb ? How does this curve compare with the one drawn in Exercise 
6.6(b) ? (c) Draw on the same graph with (b) the operating characteristic 
curve for the above test when n = 64 replaces n = 25. Use these two 
curves to make a statement about the effect of the sample size on the test 
if yS ^ 0.05. (d) Suppose the specification that the true mean breaking 
strength be 40 lb allows the true mean breaking strength to be as small as 
39.5 Ib without rejection of the lot. For the one per cent level test that 
the mean breaking strength not be smaller than 40 lb, determine the 
sample size so that for a sample mean of x = 39.5 the probability of 
the type 2 error is at most 0.05. (e) Graph the power functions for the 
two tests discussed in (a) and (c). Use these power functions to determine 
graphically the probability of rejecting when yx = 39 lb; when 
yx = 38 lb. Notice that the test using a sample of size 64 is more powerful 
in both cases than the test using n = 25. 

6.9. (a) Consider the test of //„: p. > yXo against /x < yXo, where p-o is 
some constant. Let x denote a sample mean and let d ~ x — ii^. Find 
the smallest sample size n which guarantees that for a fixed d the proba- 
bilities of the type 1 and type 2 errors will not exceed a and /3, respectively. 
It is understood that the parent population is normally distributed with 
variance o-^. [See Exercise 6.8(d) for a numerical approach and Exercise 
6.7(a) for an analogous problem.] (b) Use the expression derived in (a), 
the standard normal table, and specified values of D = dja-, a, and yS 
to make a table of minimum sample sizes. Compute at least eight values, 
letting CL = 0.05 and 0.01, ^ = 0.10 and 0.05, and D equal to values of 
interest to the student. 

Note. Even though Exercises 6.7 and 6.9 are for different tests, the 
derivations and calculations of one are like those for the other. 
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6 10 (a) Consider the test of H, against if. ^ }io. where ;i, js 

some constant Let i denote a sample mean and let rf s [f - ;i,l. 
Find the smallest sample size h which guarantees that for a fixed d the 
probabilities of the type 1 and type 2 errors will not exceed a and /S, 
respectively It is to be understood that the parent population is normally 
distributed with variance cr* (b) Use the expression derived in (a) the 
standard normal table and speafied values of D s dl<r, a, and to 
make a table of minimum sample sizes 

611 (a) Find the power function of the test of Exercise 6 2 (b) Find the 
power function for the test resulting from changing the sample size to 
25 and the significance ievet of the test to 001 (c) Graph these two 
functions on the same graph Give the co-ordinates of the points where 
the curves intersect 

6 V2 According to the Mendehan inheritance theory, certain crosses of peas 
should give smooth and wrinkled seeds in the ratio of 3 to 1 . that is, 
the relative frequency of wrinkled seeds is equal to 0 25 In a random 
sample of 720 seeds 500 were smooth (a) What is the probability of 
getting exactly 500 smooth seeds? Use both the binomial density func 
tions and the normal approximation to compute the probability and 
compare results (b) Use the given information to test the Mendelian 
inheritance theory at the five per cent level (c) Use the normal curve 
approximation to draw a power curve for the test in (b) From this curve 
determine power of the test if the true proportion of wrinkled seeds 
IS actually 020 or 0 50 (d) Indicate how one could obtain a power curve 
using the binomial distribution without applying the normal approxi* 
mation 

6 13 A random sample of size 49 was drawn from a normal distribution 
with 3 variance of 25 The mean and median of the sample were 21 3 
and 22 I respectively Assume that 49 is a large sample size (a) Find 
a 95 per cent confidence interval for the population mean, using the 
sample mean using the sample median (b) Compare the two intervals 
found in (a) writing a short summary statement (c) What sample size 
would be required in order that the 95 per cent confidence interval for 
the population mean, as determined from a sample median, be one-half 
the length of the interval determined from the sample mean in (a)7 
Answer the same question if the term “one half" is replaced by k times," 
where k is any positive real number (d) Find the confidence coeff'cient 
which gives a confidence interval based on the sample median which 
has the same length as the 95 per cent confidence interval based on 
the sample mean 

6 14 The ith observation of a random sample of size n is drawn from a normal 

distribution with mean M* = Mi “ ® variance 

<ri = a-\ = cr», where a, /S, k, and 
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are real constants. Let 

^ ^ ^ (kj - fQjxi - x) 

Find the sampling distribution of b, giving the mean and variance in 
simplest form. 

6.15. Assuming that Theorems 6.1 and 6.4 hold, prove Theorems 6.2 and 6.5. 

6.16. Assuming that Theorems 6.1 through 6.5 hold, prove Theorems 6.6 
and 6.7. 

6.17. Let /(X), . . . , denote the density function of the random variables 
Xi, . . . , ;c„ and let h s h(xi, . . . , x„) be any single- valued function of 
these variates. Then the ^th moment about the origin of h is defined 
by 

f ••• f h'^ixi, ■ . ■ , x„)f(xi, . . . , x„) dXi . . . dx„ (6.26) 
*/ _00 ^ —06 

and the moment generating function of h is defined by 

= r • • • r e^^fixi, ...,x„)dx^... dx„ (6.27) 

cZ-OO tZ—oo 

(a) Prove that if the variates a:,. . . . , are independently distributed 
and 

h = CiXi + '•• + a„x„ 

where a„ . . . , a„ are real numbers not all zero, then 

M^(t) = A4,(o, . . . M^Son 0 (6-28) 

(b) Further, prove that when h = x, and x, = ac is normally distributed 
with mean /i. and variance <r^, then 

M,(/) = m;(-^) , (6.29) 

(c) In Exercise 3.49 we showed that when atj is normally distributed 
with mean /i and variance tr^ then u = (x — (i)lcr has moment gener- 
ating function MJj) = LFse the MGF for u to find the MGF of 
X, where 


(d) Use the MGF of x obtained in (c) along with the two important 
properties of MGF’s mentioned in Exercise 5.69 to show that Theorem 
6.2 holds. Note that in this way Theorem 6.2 can be proved in its own 
right. However, it is interesting to prove Theorem 6.1 directly and then 
show, as in Exercise 6.15, that Theorem 6.2 is a corollaiy. (e) Prove 
Theorem 6.1, using the two properties of MGF’s mentioned in Exercise 
5.69 along with Eq. (6.28). 
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6 18 In Exercise 5 69 we proved the central limit theorem in case the MGF 
exists That is, we proved that when x is distnbuled with mean tt and 
variance cr*. then S — «(M. v*/n)Ior large n (or >> = (X — n 

(0. I) if n IS sufficiently large] Referring to Theorem 6 4, we note that 
the X which represents Ihe number of successes in n independent trials 
IS actually an i in a sample of size n when we think of each success as 
having the variate value I and each failure the variate value 0 Further, 
since np is the mean and iip(l — p) the v ariance of the binomial variate 
X. we may write (jr — — p)] m place of (S — ti)/(ff/v^) 

to complete the proof of Theorem 6 4 

There is a proof of Theorem 6 4 which does not depend on the 
eentral limit theorem, but docs depend on MGFs Let jc be distributed 
as a binomial Then 



may be considered a sundardized binomial variate Since the MGF of 
X according to Exercise 5 20, can be shown to be 

Af.(r) = ((l -p) + pf0' (6 30) 

It follows that 

= + (631) 

Further, using the appropriate series expansion, we find that 

^ft<0 “ "> t*. k s 3,4, (6 32) 

The proof of the theorem is complete on showing that 

limM,(r) = Af^f) = «'’» (6 33) 

The student should prove Theorem 6 4 by first proving Eqs (6 31), 
(6 32), and (6 33) 

6 19 The median is a particular quantile (Sect 2 2) Let x, denote any sample 
quantile Then it can be shown that for large n the Variance of the 
quantile x, is given by 

K(.,) = .5 

where n denotes sample size, I — p the proportion of variate values 
below AT, and /, = /(xj) the ordinate of the quantile x, m the parent 
distribution (a) Show that when x, is the median and the parent popu' 
lation IS normal the standard error of the median is approximately 
(jr/2) <r being thestandard deviation of the normal distribution 

(b) Find the standard error of the first quantile of a sample of size n 
drawn from a normal distribution with variance 
6 20 A control chart for the mean of samples of size five is to be constructed 
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Suppose that the process is in control when =20 and <r- = 40. (a) The 
upper and lower control limits for sample means are to be three standard 
deviations from the center line. Determine these limits, (b) The symmetric 
control limits for sample means are to be located so that the probability 
of a sample mean’s falling outside is 0.006. What are the limits? 

6.21, Suppose that a process is in control when ^ = 10 and cr^ = 25. For 
random samples of size four drawn each hour for 40 consecutive hours, 
the means are as follows 


Table 6.4 


Sample 

Mean 

Sample 

Mean 

Sample 

Mean 

Sample 

Mean 

1 

9.0 

11 

8.0 

21 

10.5 

31 

■n 

2 

6.7 

12 

11.0 

22 

18.7 

32 


3 

13.0 

13 

10.5 

23 

7.7 

33 

WSm 

4 

12.3 

14 

12.0 

24 

9.7 

34 


5 

7.5 

15 

11.0 

25 

9.0 

35 

IB9 

6 

9.7 

16 

10.7 

26 

10.7 

36 

B&9 

7 1 

10.3 

17 

11.5 

27 

12.0 

37 

■9 


10.3 

18 

6.7 

28 

9.7 

38 

wSm 

9 

13.0 

19 

9.5 

29 

12.5 

39 

Ira 

10 

14.5 

20 

7.5 

30 

11.5 

40 

■a 


(a) Plot the 40 means on a control chart with upper and lower control 
lines three standard deviations from the center line. Does it appear 
that the process is out of control at any point ? (b) Assume for the moment 
that the true population mean is unknown but that the variance is 25. 
Use the first 25 sample means to estimate the position of the center line, 
and to locate the three-standard deviation control lines. Plot the last 
IS data values on such a control chart and determine if the process is 
in control after the twenty-fifth hour. (If the center value is not known, 
the usual practice is to take the first 25 sample values to establish the 
standard and then use the standard until there is evidence that it should 
be changed.) 

Note. The reader should supply .his own interpretation of the data in 
Exercises 6.21 and 6.22. The means might represent (in coded form) such 
things as life of an electric lamp, tensile strength of thread, diameter 
of ball bearings, life of a washer, number of surface defects, chemical 
composition of steel, and blowout time of a fuse. The proportion of 
defective units per sample (or lot) might refer to cigarette lighters, light 
bulbs for Christmas trees, transistors, spools of yarn, errors on a typed 
page, and ill students. 

6.22. The procedures for preparing a control chart of proportion {or number) 
of defective units in repeated samples is similar to that used for sample 
means. [Theorem 6.5(or 6.4) may be used for this purpose. See the note 
at the end of Exercise 6.21 for possible interpretations.] 

Suppose that a process is in control when the proportion of defective 
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units IS p = 0 M or less Suppose that samples of size 100 are taken at 
regular intervals and the number of defective units determined (a) Let 
the lower control limit for the numbCT of defective units be zero Find 
the upper control limit for which the probability is 0 005 that a sample 
number defective will exceed it (b) Make a control chart with the limits 
of (a), plot the following 40 sample “number of defective units” on 
this chart, and comment on the state of control The number of defective 
units in 40 consecutive samples of 100 are (read across) 

3S2«334 152 

4384501 £53 

4353913 345 

0634126 12 25 

(c) Use the first 25 samples to estimate the true number of defective 
units Use this estimate to work (a) and (b) 
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SAMPLING 

FROM NORMAL POPULATIONS- 
THE CHI-SQUARE DISTRIBUTION 


We have considered (Chaps S and 6) the nature of sampling distributions 
of means and medians In this chapter we study properties and important 
applications of the sampling distribution of a measure of dispersion, namely, 
the chi square distribution Power curves are studied as they relate to tests 
of hypotheses and to the determination of sample size 

71 /NTROOUCnON 

There arc problems in which our first interest is in making some state- 
ment about a measure of the population dispersion In case the population 
IS normally distributed we use the sample variance to make such a state- 
ment, since other statistics measuring dispersion are not as reliable The 
chi-squarc distribution is introduced for problems involving the sample 
variance 

For convenience for the reader we now give m one group statements 
of four important theorems relating ip measures of dispersion of samples 
drawn randomly from normal distributions Following these theorems we 
give illustrations of applications and in the set of exercises outlines of 
proofs of key theorems (Proofs of one or more of the four theorems may 
he found in each of the ten references at the end of the chapter) 
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Theorem 7.1. Let 


w = 




a 


where the x, are normally and independently distributed with means fit and 
variances oi and l{ = (xi - Then w is distributed with density function 


fiw; ri) 





w>0 


(7.1) 


mean n, and variance 2n. The gamma function r(n/2) is defined in Exercise 
3.67. 

Note. The distribution of the random variable w is a special case of the 
gamma distribution and is known as the chi-square distribution with n 
degrees of freedom, n being the parameter which is the mean. Thus, it is 
natural that we use the square of the Greek letter chi, in place of w. 

Theorem 7.2. The sum of k independent random variables 
having chi-square distributions with v,, . . . , v* degrees of freedom, re- 
spectively, is distributed as chi-square with v, -I- . . . + v* degrees of freedom. 

Theorem 7.3. The sample mean and variance are independent random 
variables when one is sampling (randomly) from a normal population. 

Theorem 7,4. If a random sample x„ . . . , Xn is drawn from a normal 
population with mean p, and variance c®, then the statistic 

2 - xy 

•> 

O'" 


which is a random variable, has the ')C distribution with n — 1 degrees of 
freedom. 

The distribution is one of the most important sampling distributions 
in statistics. The illustrations given in this chapter represent only a small 
portion of the many applications of this distribution. Before we present 
applications, let us be sure that we understand the theorems. 

First, the reader should note that the term “chi square" is used in two 
ways— it denotes the variate and is also the name of the distribution. This 
may cause some difficulty in the beginning, since this is not the case with 
other distributions mentioned so far. For example, the variate of the 
standard normal distribution is called f, it is not called “normal.” 
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7 2 THE CHI SQUARE DISTRIBUVON— PROPERTIES 


Due to convention, the variate w and parameter n arc usually replaced 
by X* and v, respectively, and the density function (7 1) is wntten as 


f(x\p) 


2-r(^) 




X ’>0 


(7 2) 


The Greek letter v, nu, is used in place of n and refers to degrees of 
freedom as well as to population mean The density function (7 2) represents 
a one>parameter family of distributions The graph of three members of 
this family is shown in Fig 7 1, which illustrates the fact that the “shape” 
of the distribution changes with the degrees of freedom This being the 



Fig 7 1 I* Dutnbuiion for v = 1, 4, and 10 
I’m Points are Imbcated 

case, we cannot use a single simple standardization process as with the 
normal distribution (where it was necessary to table only values for the 
standard normal variate l) Thus, it is necessary that the proportion (per- 
centage) a of the area under (he %* curves to the right of the point 
be computed for each degree of freedom That is, for any a, that 

value such that 


= a (73) 

Values of %* corresponding to selected values of a are shown in Table IV 
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The proportion of the area to the right of a %? point is used instead of that 
to the left, since this is normally the way the ')C distribution is applied. 
Since the distribution is nonsymmetric, it is necessary that the upper 
and lower a level points be computed separately. 

Often we assume that a sample is randomly drawn from a single normal 
population with mean /i and variance a'. In this case, w in Theorem 7.1 
becomes 

But since the mean of a population is usally unknown, we use Theorem 
7.4 rather than Theorem 7.1 in problems involving the population variance 
cr^. According to these theorems %” is distributed with one fewer degree of 
freedom when x is used instead of {i. This is the reason we sometimes say 
that one degree of freedom is used up in finding the estimate x. 

The useful statistic of Theorem 7.4 may also be written as 

SSx _ (n — l)s^ _ ^2 


with It — 1 degrees of freedom, where SSx denotes sum of squares of 
X deviates and denotes the sample variance. Since the distribution of 
depends on the number of degrees of freedom, we sometimes write 
“X^in — 1)” in place of with n — 1 degrees of freedom.” From Eq. 

(7.4) it is clear that the sampling distribution of is easily obtained from 
the distribution of %^(« — 1) and that 

(7.5) 

It should be apparent that Eq. (7.4) shows a relation between x^ 
in very much the same way that t = {x - ) relates t and x. 

Hence, we sometimes refer to the statistic as a standardized sample 
variance in much the same way we think of the statistic t as a standardized 
sample mean. ' 

The relation in Eq, (7.4) may be used to establish a confidence interval 
with confidence coefficient 7 = 1 - a. This may be done by first finding 
two values of %=, say xf and %i, such that 


P[x\ < %“ < xl] = r^fix^; v) dx^ = y 


and then changing the inequalities, using Eq. (7.4), to obtain 


pTSSx . , .SSx' __ 

I J 


(7.6) 


(7.7) 
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The length of Ihe 1007 per cent confidence interval given by Eq (7 7) is 



and IS shortest when 

IS a minimum, SSx being fixed for a particular sample If appropriate tables 
were available, and could be found by inspection so as to minimize 
[(I/Xi) - (I/X»)]. but in their absence the required computation using 
Eq (7 6) IS usually considered too long lo be practical Instead, in setting 
up a 100(1 — a) per cent confidence interval, we usually choose 
Xf = and Xt = Xf.rt). that is, xf and xi are selected so that a/2 of 
the area is in each tail of the distribution Such a choice gives an interval 
with length very near the minimum unless s' is small Sometimes in practice 
zero IS taken as the lower bound of the interval and the upper limit is 
selected so that x! ~ Xf— 


7 3 APPLICATIONS OF THE CHI-SQUARE WSTRtBUnON 


Etaiiiple 7.1. Machine A is to be compared with a standard for Che pre- 
cision With which It cuts oiT pieces The variance for the standard machine 
B IS 0 030 A random sample of ten pieces cut off by machine A has a variance 
of 0058 (a) Find a 95 per cent confidence interval for the variance e!, of 
machine A (b) Is there a real difference between the variances of machines 
A and B'' 

Now SSx ss (10 — l)(0058) — 0 522. and from Table IV, using nine 
degrees of freedom, we find xf — xVw “ 2 700 and = X-’om = 19 02 Thus, 
according to Eq (7 7) a 95 per cent confidence interval of is given by 


0 522 

1T52 


<ei< 


0 522 
2 709 


or 


0 027 < <^ < 0 193 


Xi = X'ti = 3 325, we obtain SSx/j^ — 0 157 and a 95 per cent con- 
fidence interval 

0<<TiC<tiS7 

The second interval is shorter than the first interval, but neither is the shortest 
95 per cent confidence interval 

Earlier we noted that point estimates are not very useful However, it 
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should be mentioned here that = 0.058 is the best single estimate of the 
variance in the sense that the mean of the sampling distribution of is 
<r^, and the variance is smaller than that for other measures of dispersion 
commonly used. These statements are based upon the assumption that the 
parent population is normal. 

The (b) part of Example 7.1 is answered by following the general pro- 
cedure for testing hypotheses given on p. 218. 

Hoi = 0.030 and H„\ o-' ^ 0.030. 

2. Assume that the ten sample values were independently obtained from 
the same normal population. Let the significance level be a = 0.02. 

3. The statistic to be tested is 

2 _ (n - ly _ 

^ ~ cr^ 0.030 


is generally easier to use than since critical values can be read 
directly from the tables. 

4. Since the alternative hypothesis includes values of both larger and 
smaller than 0.030, we use a two-tailed test with one per cent of the 
area in each tail. Thus, the critical region includes all those values of 
X’ for which < 2.088 or > 21.666. 

5. The sample statistic is 


,2 _ 9(0.058) _ 
0.030 - 


17.4 


6. Since 17.4 does not fall in the critical region, we fail to reject the null 
hypothesis. That is, we do not have enough evidence to say that the 
population variance is different from 0.030. (As a practical matter, 
if the assumptions of step 2 are considered realistic and the person 
responsible for action agrees to the test procedure, then machine A 
would probably be allowed to go on cutting pieces with the provision 
that it be checked regularly.) 

The reader should realize that Example 7.1 was presented for illustrative 
purposes. An investigator would not normally solve both (a) and (b) with 
one set of data, nor would he be likely to use a 95 per cent confidence inter- 
val in one part and a two per cent significance level in the other. Further, 
he would be unwise to let a have such a small value without taking into 
account the size of the type 2 error and the consequences of making such 
an error. We illustrate this problem in Example 7.4. 

On letting n = I in Theorem 7.1, we see that^w = If = is distributed 
as x’' with one degree of freedom, provided / is distributed normally with 
pro mean and unit variance. The connection between these distributions 
is made clearer in the following example. 
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Example 7.2. Relate intervals of the t vanate with values of the r‘ = x’ 
variate 

Consider the symmetric interval — /, < r < »«, where /, is a positive 
real number The square of any vaioe between 0 and I, is a value between 
0 and rj, and the square of any value between - f, and 0 is a value between 
{— /,)* = i\ and 0 Thus, if r is a value in the interval — then 

r* IS a value in the interval 0 < f < rj and 


Pi-lt <t<t^ = PlO<t' <tn 

In particular, if f, = 0 126, rj » 00159 and. according to Tables H and IV, 
we see that 

/•{-0126<r<0 126) = 0 10 
and 

P[0<r*<00l59j = 0 10 

That is, that porticn of the normal distribution which » symmetric about 
the mean becomes the left tail of the x* distribution 
If to * 1 96, then 


Pll > 1 96J = 0025 */>(!<- 1 96] 

so that 

Pit* > (1 96)‘I = Plx* > 3 84] 0 05 

That IS, five per cent of the x’ values are greater than 3 84 when t falls in the 
intervals r < - 1 96 or r > 1 96 So that portion of the normal distribution 
which is in two symmetric tails becomes the right tail of the x’ distribution 
From this it should be clear that any two-tailed a level test using symmetric 
tails of the normal distribution as a critical region is the same test as a one- 
tailed test using the right tail of a x’ distribution with one degree of freedom 
as the critical region (However, no x* test can replace a one-tailed normal 
test, nor is it reasonable to replace a two tailed x* lest with a three-interval 
normal test The student should be cautioned that the above statements 
apply only for the %* distribution with one degree of freedom ) 


7 4 DEGREES OF FREEDOM 

The X* distribution is represented by a family of curves There is one 
jWcaeft ♦■Ji'aic of fftir parairertw ir 73h? partwi’jr of fftc pimrirrctcr 

which determines a curve is also the mean of the distribution or the number 
of degrees of freedom of the distribution which the curve represents 
Since the distribution of the statistic 2 (■*■( — ■^)* = (« - l)j’] for 

S4mple Size n depends on the x* distribution with p == n — 1 degrees of 
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freedom, and, since we say, “%= is distributed as with n - 1 degrees of 
freedom,” it is only natural that we say “SSx is distributed [aso-^%“(/i - 1)] 
with n - 1 degrees of freedom.” Thus, we say that the estimator of as 
well as the statistic (n — l)s“ has n — I degrees of freedom, n — 1 being a 
constant. 

So far we have considered the number of degrees of freedom associated 
with the variance estimator obtained from a sample of size n drawn from a 
single normal population with variance a-\ However, independent samples 
of sizes n, and obtained from two populations with a common variance 
<T® may be used to estimate <r^. In this case, we say the variance estimator 
4 has 1 / = («i — 1) -1- (wj — 1) degrees of freedom. One argument for say- 
ing this goes as follows. If the statistics SSx^ = («, — l)s? and SSx^ = 
(hj — 1)^2 are computed from independent samples, they are independent 
random variables. Thus 


■)& 



and 



are also independent random variables distributed with v, = u, — 1 and 
Vj = Wj - 1 degrees of freedom, respectively, and, according to Theorem 


7.2 


-L v 2 - («1 - 1>1 + (^2 - 1)4 

/Cl * /Cs 2 


(7.8) 


is distributed with v = p, -I- Vj = Ui + ”2 ~ 2 degrees of freedom. Hence, 
we say that 


SSx, + SSx, = {n, - l)s\ 4- {n, - \)sl = + xD 

is distributed [as o-^(Xi + %l) - trV] with n, + - 2 degrees of freedom 

and that 

^2 SSx, -f- SSx, 

n, +n,- 2 

\ 

has n, + — 2 degrees of freedom. In general, when SSx, = {n, ~ l)s[, 

. . . , SSxy, = (rii, — l)s? are computed from independent samples drawn 
from k normal populations with a common variance a-\ we say that SSx, -f 

■ ■ ■ + (ui — 1) -1- • • • -f (uj, — 1) degrees of freedom and that 

the estimator 

„2 _ SSx, 4- ■ • • -j- SSx, 

” (Ui — 1) 4- • • • -4- (/jfc — 1) 

has «i 4 - • • • 4- «ji — A' degrees of freedom. 

The above discussion indicates how the number of degrees of freedom 
of a variance estimator associated with a distribution may be found, but 
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It does not necessarily give the student an insight into a real understanding 
of “degrees of freedom “ The term was introduced by Fisher, who apparently 
had m mind the degrees of freedom of a dynamical system, that is, the number 
of independent co ordinate values necessary to determine the system At 
the present time the term is used in different senses, as discussed below 
In selecting a random sample of n values from a population, we select 
a value for each of n variables, each variable ranging over the population 
of values In this case the selection involves n degrees of freedom of choice, 
since the n values can be arbitrarily obtained within the specification of the 
system 

Considerations of statistics obtained from samples may lead to restrictions 
in the form of linear relations among sample values For example, if the 
mean a of a sample of n values Xi, , x, is fixed, and if n — 1 of the values 
of X are selected arbitrarily, the remaining value is determined, and we 
say that (here arc ft - 1 degrees of freedom of choice In general if r in- 
dependent linear relations {9} are imposed on the n values in a sample^ 
only ft - p of the values can be selected arbitrarily, and we say that there 
are n — p degrees of freedom of choice 

Further, if a sample of size n is grouped into k intervals and the number 
of values m il; - ( intervals is arbitrarily assigned, then the number of 
values falling in ihe Arlh interval is determined )n this case we say there 
are fc - 1 degrees of freedom 

We refer most often to (he nurober of degrees of freedom associated 
With quadratic forms In the expression SSx = 2 (*i 
only n - I arbitrary choices possible on the deviates e, s x, — if when 
the mean Ji is fixed (that is when (here is only one linear relation restricting 
the X values) In general if k independent linear relations are imposed on 
the n values m a quadratic form (hen it is possible by a suitable nonsmgular 
linear transformation on the variables to express the quadratic form as 
the sum of squares of exactly n ~ k of the new independent variables 
The number of degrees of freedom is n — A, being the same as the rank of 
the original quadratic form {2, 6) For example in the simple case where 
there is only one linear restriction of the form 2 = 't-* we may write, 

when n = 2 

SSx = (». - »)■ + (I, - *)■ = (7 9) 

or 

ef = 

where q, = (x, — Xj)/V^i and when it =: 3 y 

SSx = (x, - xy + (x, — x)* + (x, - x)* 

= ^ / X, +x,- 2x, y (T 10) 
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or 

e? -f- d + ei = + 92 

where = (x, -h x, — 2 x 3 )/ 

When SSx is fixed, the degrees of freedom for Eqs. (7.9) and (7.10) are 
obviously 2-1 = 1 and 3-1=2, respectively. 

The expression “degrees of freedom” is also used to denote the number 
of independent comparisons which can be made between members of a 
sample. This is discussed in Sect. 10.6. 

Different members of the family of distributions with v degrees of 
freedom have different means. Sometimes it is desirable to make a trans- 
formation so that all members of the transformed family have the same mean. 
Since the mean of 


^ ~ 
^ a- 

isv = n — 1, the mean of 


n — 1 0-* 

when a generalization of Theorem 2.3 is used, is 1. We say that the ratio 
jVo’’* is distributed as per degree of freedom with v degrees of freedom” 
and denote this by with v d.f.” Graphs of three members of the 
family of xV" distributions are shown in Fig. 7.2 and values, of 

X^lv corresponding to selected values of a are shown in Table V. The 
X^lv with V d.f. distribution makes the comparison of two variances with 
unequal sample sizes easier. 
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7 5 POWfR CURVES FOR CHWQWARE TCSrS 


In Example 7 1 wc mentioned that an experimenter would be unwise 
to select the size of the type 1 error without taking into account the size 
of the type 2 error Some problems involving /9 or 1 - i9 m a one tailed 
a level test are discussed in the folloiving examples 

Example 7J A ran<}om sample of sire eleven is drawn ftom a normal 
population with unknown variance <r* For null hypothesis e = 10 and 
alternative hypothesis e* > 10 (a) describe the appropriate one-sided 
five per cent level test, (b) determine the power curve and (c) use this curve 
to find the size of the type 2 error when the true variance is actually 25 
Using the statistic x* — “ iWlO with ten degrees of freedom, 

we find the critical point » 18 31 Thus, the critical region is made 
up of all values of x’ such that x* > 18 31 If x’ = ^ particular 

sample falls in the critical region wc reject the null hypothesis and conclude 
that «r’ > 10, otherwise, we fail to reject the null hypothesis and conclude 
that we do not have enough evidence to say <r* > 10 
The power of the test described above is given by 

p(a') s= f (sample %» falling in critical region H, true) 
*EIX’>18 3l.<r‘>10J 

- !)J‘> 10(18 31), e‘> 10) 

or 

P(O=J’[x'>^.»'>10] (7 11) 

From Eq (7 11) it is clear that the jxiwer increases with X’ or <t’ since 
18 31/X’ decreases as X’ increases Further, since 


It follows that 


or 


Fix' > X? J = 1 - = lK»’) 


» 


18 31 
X* 


1831 

'- 157 " 


yd 

Td » 


(712) 


The power curve may be graphed as a function of X* as well as a function 
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of (T^ Corresponding to selected values of the power function p(K^) = 1—0 
(X,=), we use Table IV to find which is used in Eq. (7.12) to determine 
Thus, for ten degrees of freedom we find the following 


Table 7.1 


a j 

.90 

.75 

.50 

.25 

.10 

.05 

.01 


15.99 

12.55 

9.34 

6.74 

4.87 

3.94 

2.56 

1.14 

1.47 

1.96 

2.72 

3.76 

4.65 

7.15 


Plotting the points (V, 1 - /S) or (X®, p(V)) and connecting them by 
a smooth curve, we obtain the power curve shown in Fig. 7.3. (The variances 
(T^ corresponding to selected values of are indicated for the case where 
the hypothesized variance is crl = 10.) 



I 1 1 1 I I I I I £7-2 

10 20 30 40 50 60 70 80 

Fig, 7.3 Power Curve for the One-Sided x" Test of cr- = 

When v = 10 and « = .05. a- — lOA- 

Using the graph, we see that the power of the test of the null hypothesis 
(T- = 0-5 is 0.78 when the true variance or- is three times larger than o-^. 
In particular, if 0-5 = 10, then the probability of rejecting the null hypothesis 
o-' = 10 when o-’ = 30 is 0.78. Hence, the probability of failing to reject 
the null hypothesis is 1 - 0.78 = 0.22; that is, the probability of the type 
2 error is 0.22. 

We may find 0 (ct") directly from the graph as 1 — p(<r-). Thus, for 
0-= = 25 we find 0(25) = I - p(25) = 1 - 0.675 = 0.325. That is, in 
roughly one-third of the cases where the true variance is 25 or greater, 
we would conclude that the variance is not significantly larger than ten, 
and this could be very serious in many experiments. 

If it is required in a five per cent level one-sided test of the null hypothesis 
a-- = o-'i that 0 be not greater than 0.10 when the true variance is 2.5crl 
then the sample size must be considerably larger than 1 1 . Actually, we must 
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Table 7J 

for the one-sided five per cent level x’ test of the null 
o’ = "S against the aluraative a’ > p 5 (Useful m finding power, 
sue of type 2 error, and sample size ) 



A« 



I fl *- 01 

05 

10 

25 

50 

80 

2 

1 S99 

1 294 3 

58 4 

284 

104 

4 32 

1 80 

3 

7 8} 

6’’ 9 

222 

M4 

644 

3 30 

168 

4 

9 49 

32 0 

133 

892 

494 

2 83 

1 58 

5 

(107 

200 

966 

688 

4 13 

234 

152 

6 

12 59 

144 

770 

5 71 

364 

2 35 

1 47 

S 

1331 

942 

367 

444 

306 

211 

141 

to 

1831 

7 16 

4 63 

3 76 

272 

196 

1 37 

12 

2103 

5 89 

4 02 

334 

249 

185 

133 

14 

23 68 

3 08 

360 

304 

2.33 

1 78 

I 30 

16 

26 30 

4 33 

3 30 

282 

221 

171 

1.28 

18 

28 87 

4 11 

307 

266 

2 11 

1 67 

127 

20 

31 41 

3 80 

2 89 

2 52 

203 

162 

126 

23 

37 63 

3 27 

2 38 

229 

1 89 

155 

123 

30 

43 77 

2 93 

237 

213 

179 

149 

121 

40 

53 76 

152 

210 

192 

1 66 

142 

1 18 

30 

67 30 

2 27 

>94 

1 79 

139 

1 37 

1 16 

60 

7908 

2ll 

183 

170 

151 

133 

1 IS 

70 

90 53 

199 

173 

164 

147 

131 

1 14 

80 

101 9 

190 

168 

1 59 

143 

128 

1 13 

90 

113 1 

183 

164 

134 

140 

127 

112 

100 

124 3 

177 

160 

1 51 

138 

125 

1 11 



SECT. 7.6. 


SAMPLING— THE CHI-SQUARE DISTRIBUTION 


243 


find n — 1 such that the ratio X- = xvJx'm is not greater than 2.5. 
Table 7.2 may be used to determine the sample size. For j/ = 20. X- = 2.52, 
and for v = 25, X“ = 2.29. Using linear interpolation, we find X- = 2.48 
when V = 21. Thus, we feel safe in saying that a sample of size n — v + 1 
= 22 will give the required protection against errors of types 1 and 2, 
even though linear interpolation is not very good for v less than 30. 

Table 7.2 may also be used in constructing a family of power or operat- 
ing characteristic curves which are adequate for most purposes. Eight power 
curves are shown in Fig. 7.4. These curves may be of use in finding the ap- 
proximate probability of rejecting the null hypothesis tr' — o-q when the 
true variance has a value o-j such that a-j > trj. 


7.6. SAMPLE SIZE FOR A TEST 

The curves in Fig. 7.4 may also be used to find the minimum sample 
size in a five per cent level test which insures that /3 does not exceed a 
specified value for a particular value of er-. This is illustrated in the following 
example. 

Example 7.4. In Example 7.1, suppose the null hypothesis is a-- = 0.0009 
and the alternative hypothesis is <r- > 0.0009. Further, suppose a random 
sample is to be taken from machine A to determine whether the company 
should go to the expense of buying a new machine, (a) Determine what 
sample size should be drawn in order that a = 0.05 and /S = 0.05 when 
er^ = 0.0027. (b) Discuss the consequences of making the type 1 or type 2 
error. 

When or- = 0.0027 and 0-5 = 0.0009, X- = 3. Since /3 ~ 0.05, the power 
is to be 0.95 when a-’’ = 0.0027. Thus, with the aid of Fig. 7.4 we find, by 
interpolation, that v = n — 1 = 20. This means that when a random sample 
of size 21 is taken from machine A which actually has true variance 0.0027, 
and a five per cent level one-sided %- test is applied, then the chance of failing 
to reject the hypothesis a-^ = 0.0009 is not more than 0.05. That is, the 
probability of rejecting the null hypothesis is 0.95. For the one-sided five 
per cent level test, the above can be stated in more general terms as “for 
a sample of size 21 the chance of making a type 2 error is not more than 
0.05 when a-" is not less than 0.0027.” As a matter of fact, the null hypothesis 
a-- = crj could also be stated as cr- < 0-5 (for reasons similar to those given 
following Example 6.10. which involved /i > 1000). The sample size can 
be found directly from Table 7.1 for fixed a, /9, and cr'-. 

In answering (b), note that as a result of the test of the null hypothesis 
one of two decisions nii/sr be made. Either decision can be correct or in 
error. Associated with each decision is some action. If the decision is wrong, 
the action is wrong. Thus, if machine A actually does cut pie’ces with the 
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precision of the standard, and the dcasion resulting from the test is that it 
does not, a new machine will be unnecessarily purchased at considerable 
expense On the other hand, if the decision is that the null hypothesis can- 
not be rejected, and machine A actually does cut pieces with a variance 
greater than 0 0027, then the company produces substandard parts and is 
in a position to Jose customers at considerable expense to the company 

It should be observed at this point (hat the decision concerning the sizes 
of a and ^ is not one the consulting statistician should be called upon to 
make Indeed, he has no rule for fixing the maximum sizes of type 1 and 
type 2 errors This is in the hands of the investigator or supervisor For 
example, in the problem about the machines which cut pieces, someone 
m a responsible position concerning money matters probably should 
determine the sizes of a and If the expense to the company in making 
a type I or type 2 error is about the same, then the size of a and should 
be about the same But the decision as to whether the size of these errors 
should be 001. 005, 0 10, or something else is not the prerogative of the 
statistician Naturally, the common size of a and $ should be as small as 
possible Houever, there is a nonzero limit to the size For (he smaller 
the probability of making an error, the larger the sample must be, and the 
greater the expense to the company m lime and money, among other things 
Clearly, someone other than (he consulting statistician must make a deci- 
sion on what values to assign <t and $ in order to keep a “reasonable balance" 
withm the company 

In ease the unnecessary expense incurred by making an error of type 
2 IS decidedly greater than that incurred in making a type I error, the size 
of y8 should be smaller than the size of a The question is, how muchsmallert 
Again, this is a problem for someone responsible for taking action result- 
ing from the statistical test procedure Even the problem of estimating how 
expensive each type of error will be for the company is a complicated prob- 
lem in accounting 

77 SUMMARY REMARKS 

We have shown how to determine the appropriate sample size for 
assigned values of a |9, and X* It is often the case that /3 must be determined 
when a, X’ and n are known — this is straightforward In fact, it should 
be clear by now that, when three of the four variables cr, Q, n, and X* are 
arbitrarily selected the fourth is uniquely determined it being assumed 
that random samples arc drawn from a single normal population For 
example, when X* and n are specified, a may be uniquely determined 
We may use Formula (7 12) to find xi = X’-;^ a and then look m 
Table IV with v = n — I degrees of freedom for this value 

The discussion so far has centered around the one-sided right tailed 
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test. It is assumed that the student can give similar arguments for an 
a level test of the null hypothesis cr = o-| against the alternative hypothesis 
o-’* <cr|. For example, the power function for' such a test is given by 


p(o-=) = P 


X’W < 


Xf-a(v) . 


or’' <<ri 


.2 

0 


(7.13) 


where %^(v) denotes a random variable distributed as chi-square with 
V = n — 1 degrees of freedom and X' = Using Eq. (7.13), we may 
construct a table similar to Table 7.2 and a family of curves similar to those 
of Fig. 7.4. For a specified variance <rf < a'l and fixed error sizes oc and 
/3 the sample size n may be determined from 


>2 _ O'? _ xUAn - 1) 
<rl~ Xlin - 1) 


(7.14) 


For a two-sided a level Isst of the null hypothesis = crl against 
the alternative hypothesis a-" ^ o-g the methods are straightforward. Gener- 
ally, the two critical points Xt ^nd %| are taken as the x\-ia/i)(.v) 
and XwD (v) points, respectively, of a chi-square distribution with 
V = « — 1 degrees of freedom. Then the power function for the test is given 
by 


P(f^) = P\xKv) < X l - J £f - )( j!. ) l H- pfx'Ci/) < (7.15) 


where X’' = If a and ^ are fixed and the specified variance erf is 
smaller than arl but not near <tI, the sample size n may be determined 
approximately from 


£[ ^ %?-(a/2)(v) 

Xb(jS 


(7.16) 


If a and are fixed and the specified variance o-? is larger than crj but not 
near o-J, the sample size « may be determined approximately from 


£? ^ X(a/2)(V) 
Xl-fl(v) 


(7.17) 


Table 7.2 and Fig. 7.4 were constructed for discussions involving a type 
1 error of size 0.05 in one-sided tests with critical region in the right tail. 
But these constructions are inadequate for tests of this type. Some experi- 
ments would require that Fig. 7.4 include many more curves and that 
other figures be drawn for values of a such as 0.005, 0.01, 0.10, 0.20. There 
is a similar need for one-sided tests with critical regions in the left tail 
and for two-sided tests. The student can find further information on 
tests, power and operating characteristic curves, and size of sample in Refs, 
[1,4, 6, 9, 10] at the end of this chapter. 
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7S EXERCISES 

7 I For a sample of size ten 5 * = 0 002285 Find 95 per cent confidence 
intervals for tr’ and <r 

7 2 Graph the curve for the chi square distribution when i> = 2, when 
p 3 

73 Graph the curve for the ;^/i» distnbuoo/i when y — 2, when p = 3 

74 A random sample of s«e 16 has variance 2 23 Use a two sided five 
per cent level test to determine whether the true variance js different 
from I 5 

7 5 The seven random observations 40 53, 34, 48, 49 35. 53 are from a 
normal population with mean 50 and standard deviation 10 In order 
to acquaint the student with errors made in testing hypotheses pretend 
that the population variance 15 unknown and use a five per cent level 
two sided lest for ihc null hypothesis that the population variance is 
(al 100 (b) 99 (c> <01 (d) 20. and (e) 500 In each case state whether 
the test conclusion is correct or whether a type 1 error or type 2 error 
IS made 

7 6 Prove Eqs {7 9jand(710j 

77 The lengths m inches of a random sample of eight parts cut off by 
a certain machine arc 

0 823 0 793 0 609 0 781 

0790 0 813 0 802 0 797 

The specified standard deviation is not 10 exceed 0 005 in Test at the 
five per cent level (he hypoihcsis that the population standard deviation 
IS 0 005 in 

7 8 The tensile strengths in pounds of a random sample of ten pieces of 
a certain type of yarn are 

440 524 644 500 570 

482 578 578 410 474 

The specified standard deviation is not 10 exceed 60 lb (a) Use a five 
per cem level test lo test the hypothesis that the population standard 
deviation is 60 th (b) Find a 95 per cent confidence inicrval for the true 
variance letting the lower limit be zero (c) Find the power curve for 
the hypothesis m (ai Id) Use the curve found in<c» to determine the size 
of the type 1 error when Ihc true variance is actually 7500 (e) Determine 
what sample size should be drawn in order that /( 0 05 and fi 0 10 

when IT 900 

"9 la) Draw ihc power cone for iht test in Exertiso 7 7 find a- when 
0 10 <bi Determine what sample size should be drawn in order 
that fi 001 and 005 when a- 0004 
7 10 (a) Draw the power curst for the test m Excreisc 7 4 and use this curve 
to find <r when /if 005 fb) Find the smallest sanipte size for the test 
m Exercise 7 4 for which z* OOSand/lf OOlwheno- 20 
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7.11. Suppose that a:, (/ = 1, 2, 3) is normally distributed with mean fi and 
variance o-^ Let and q, be the linear combinations defined following 
Eqs. (7.9) and (7.10), respectively, (a) Find £( 9 J (; = 1.2). (b) Show 
that ^1 and q^ are independently distributed. 

Him. Show that the covariance of q^ and q^ is zero. (According to 
Sect. 3.4.1, we know that the random variables u and v in a bivariate 
normal distribution are independently distributed when the covariance 
of u and v is zero.) 

7.12. Use Eqs. (7.9) and (7.10) as guides in expressing 

2 (^t - x)- 

as the sum of squares of three independent variables, 9 ,, q^, and q^ 
Prove that cov (^,, q,) — cov (^j, q,) = 0. 

Him. See Exercise 7.11. 

7.13. Use Eqs. (7.9) and (7.10) and Exercise 7.12 as guides in expressing 

2 (Xi - x)= 

as the sum of squares of n — 1 independent variables, (?,, q^, ... , q^.i. 
Prove that cov (?„_ 2 , c/„_i) — 0. If x, (/ = 1, . . . , n) is normally distributed 
with mean fjo and variance tr-, what can you say about the joint density 
function of^,, . . . 

7.14. Find the length of the shortest 95 per cent confidence interval for cr- 
when V — 2. What proportion of the values are greater than the 
upper value 

7.15. (a) Show that the moment generating function of X-, M is 
(1 — 20) (Note that the t used in earlier chapters is replaced by 0. 
This is done to avoid confusion with the standard normal f.) (b) Show 
that the moment generating function of 



given in Theorem 7.1, is (1 - 20)-” \ (c) Use (a) and (b) to prove 
Theorem 7.1. 


7.16. Use moment generating functions to prove Theorem 7.2. 


i. 


7.17. Let ATj 0 — 1, ,..,«) be normally distributed with mean w and variance 
a--. Let 

" V 

2 (^i - .v)= 9= + . . . + 

/ 

where the q's are defined in Sect. 7.4 and Exercise 7.13. (a) Prove that 



248 


SAMPUNO— THE CMt-SQUAKE DISTRIBUTIOf^ 


CHAP T 


E{q,) = 0 and y(q,) = «r» (j = I, - , « - D (P) 

.7r) = 0fory^y'{y'=2. . 1) (c) Prove Theorem 4. (d) Prove 

cov (x, qj) = 0 (e) Prove Theorem 7 3 

Hint Since jt is independent of each qj, it is independent of 


7.18. Prove 


Then use the identity (7 18) and Theorems 71,7 2. and 7 3 to prove 
Theorem 7 4 (For further study of distributions of quadratic forms 
see Refs [1.3.7,10.1) 
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SAMPLING 

FROM NORMAL POPULATIONS— 
THE STUDENT t DISTRIBUTION 


The Student t distribution is used to establish confidence intervals 
and test hypotheses when the population variance is not known and the 
sample size is small. Properties and uses of the power function are given. 
Tolerance intervals are introduced and compared with confidence intervals. 
The t distribution is used in a paired-observations experiment and is com- 
pared with the sign test. Model and observational equations are discussed. 


8.1. INTRODUCTION 

When the sample mean was used in Sect. 6.3 to obtain a confidence 
interval or to test a hypothesis about the population mean, it was necessary 
that the population variance be known or that the sample be large. In 
many experiments the population variance is not known and the sample 
size is so small that the sample variance cannot be used in place of the 
population variance. Fortunately, due to the Student i distribution, the 
sample variance along with the sample mean may be used in making state- 
ments about the population mean. 

We now give statements of four important theorems relating to the 
Student t distribution. Following these theorems we give illustrations of 
applications and, in the set of exercises, outlines of proofs of these key 
theorems. 
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Thforem 8 1. 1/ u has a normal dislnbuiioa nith mean 0 and \ariance 
I, « has a x'-distribulion i^tth » degrees of freedom, and u and are in- 
dependently distributed, then the random xariable 


has density function 




/(/. p) = 


n(p + n/21 




— <»</< OO (8 1) 


with mean 0 and \oriance v/(v — 2) for p> 2 The statistic t Is said to ha\e 
the Student t distribution, or simply the t distribution, with v degrees of 
freedom 

Theorem 8.2. If a random sample r,. x, . t, is drawn from a normal 
population with mean /i and xariance o’, then the statistic 


where s’ = ^(v, - x)7(« - l>W ? = ^ r./fl is distributed as the Student 
t distrihiition with n - 1 degrees of freedom 

Theorem 8 3 /(i the number of degrees of freedom of s approaches 
infniti the Sludeiii t distribution approaches the normal distribution with 
mean 0 and \ariance i 


Theorem 8 4 If two populations are normal and ha\e the same xariance, 
then the statistic 


_ (X, - K.) - (/i. -H- 


has the Student t distribution with n, + n. — 2 degrees of freedom Random 
samples v,,. x,, and x-,. \ of sizesn, and n. are independently 

drawn from populations I and 2 with true means fi, and n,, respectnely The 
sample means are 

i-.. 

and the pooled sample xariance is 

2(Vii - + 2('f- - f-V 2 2(x,j - X.) (83J 

' /I, + H. — 2 + Wj — 2 
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8.2. PROPERTIES OF THE f DISTRIBUTION 

Before considering applications of Theorems 8.1 through 8.4, we illus- 
trate certain properties of the t distribution. 

The density function (8.1) represents a one-parameter family of distri- 
butions. The graphs of three members of this family are shown in Fig. 
8.1, which illustrates the fact that the “shape” of the distribution changes 
with the degrees of freedom. This being the case, we cannot tabulate ap- 
propriate values using only a single simple standardization process as with 
the normal distribution. Thus, it is necessary that the probability that a 
value of t fall in any particular interval be computed for each degree of 
freedom. Since the I distribution is symmetric, and since we usually require 
probabilities associated with the tail of the distribution, we let /„(v) be 
that positive value of t with v degrees of freedom such that 

P[i > ~ a 

where a is any number between 0 and 0.5. Values of t„ corresponding to 
selected values of a are shown in Table VI. The probability that t is less 
than a value in the left tail, is given by 

< -a^)] = /’[/> /»] 

From Fig. 8.1 and Theorem 8.3, it is clear that the t distribution ap- 
proaches the normal distribution as i> approaches It is for this reason 
that we sometimes call the normal distribution a t distribution with an 
infinite number of degrees of freedom and include probability points 
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t, = t{oo) m Table VI Further, as the number of degrees of freedom 
approaches one the i distribufton is slower and slower in its rate of approach 
to the / axis That is, for any point r,(p) on the positive t axis, the proba- 
bility that t IS greater than r,(i») gets larger as v approaches one 

The dispersion of the t distribution increases as the number of degrees 
of freedom decreases Since, according to Theorem 8 1, the variance is 
vj{v — 2) we observe that, as v changes from oo to 5 to 3, the variance 
changes from Ito ^ to 3 For i» = I, the density function for the t distribution 
of Eq 8 1 reduces to 


since r(l) = 1 and Ffi) = according to Exercise 3 67 The density 
function in Eq (8 4) is a special case of the Couchy density function It 
can be shown that the Couchy distribution does not have finite mean and 
variance In fact, no moment of the Couchy distribution is finite The case 
where p s= 2 IS considered m the exercises 

Theorem 8 2 is actually a corollary of Theorem 8 1 By Theorem 7 3 


we know that for a sample of size n the sample mean x and the sample 
variance i* = SSI{n — I) are independently distributed Thus 


and 


{n - l)s» 


are independently distributed, /{.O', and n being constants But these random 
variables arc distributed as u and >• respectively in Tlieorem 8 1 where 
V s5 n - 1 Hence 




(8 5) 


IS distributed as the Student r vanable with n - I degrees of freedom 
The reader should note that the degrees of freedom associated with t is the 
same as the number of degrees of freedom associated with s’ 

The two statistics x and s* used to compute t may or may not be deter- 
mined from a single sample If they are computed from samples of different 
sizes, the number of degrees of freedom for the t statistic is the same as that 
of j’ 


8 3 APPlIC/riONS Of THE t WSTRIBUTION 

The I statistic is used in very much the same way as the normal u, with 
the exception that the sample variance is required instead of the population 
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variance. This leads to extra computations and more elaborate tables. The 
following examples bring out the differences in the applications of the t 
and normal distributions. 

Example 8.1. A certain type of rat shows a mean gain of 65 grams 
during the first three months of life. Twelve rats were fed a particular diet 
from birth until age three months and the following weight gains in grams 
were observed: 55, 62, 54, 57, 65, 64, 60, 63, 58, 67, 63, and 61. (a) Is there 
reason to believe that the diet causes a change in the amount of weight 
gained? (b) Find a 95 per >cent confidence interval for the mean weight 
gained. 

We answer (a) by using the general procedure for testing hypotheses 
given on p. 218. 

1. Hg-.ii — 65 grams and 65 grams, since an increase in 

weight less than or greater than 65 grams indicates that the diet has 
some effect. 

2. Assume that the twelve sample values were independently obtained 
from the same normal population. Let the significance level be 
cc = 0.05. 

3. The statistic to be tested is 

t = ^ ~ 

s 

Vl2 

since i values can be read directly from the tables. 

4. For an experimenter who is equally concerned with detecting a gain 
in weight which is less than or greater than 65 grams, we take as 
the critical region all those values of t for which 

t < / 975 ( 1 1 ) = - 2.20 or t > /. 9 „(n) = 2.20 

5. Since x = 1 ^ = 60.75 and 


.2 _ 12(44467) - (729)= 

mm 


16.38 


the calculated t statistic is 


tr = 


60.75 - 65 




T OT 

12 


-4.25 _ 
1.17 ~ 


-3.63 


6 . Since the sample statistic = -3.63 falls in the left critical region, 
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vse reject the hypothesis that /i = 65 grams and conclude that the 
diet docs cause a decrease in Ihc amount of weight gamed, under- 
standing that there is a five per cent chance of making a type 1 error 
Actually, the probability of saying that there is a decrease in weight 
gained when /ft is true is only 0025 
The symmetric 95 per cent confitience limits are given by i tj,,} • 
where s, = ^/s’/n From steps 4 and 5 above, we have x = 60 75, 
l„s “ 2 20, and s, = I 17 Thus, the limits for the true weight gamed 
are 58 17 and 63 32 That is 


5817<m<6332 (8 6) 

Of course, the true mean eilber falls in the interval (8 6), or it does not 
The 95 per cent simply indicaies our degree of confidence that the interval 
in (8 6) includes the true mean 

If the alternative hypothesis in Example 8 I (a) were fi < 65 grams 
(or ^ > 65 grams), a one-tailed test should be used Thus, if the significance 
level of the test remains the same (or « 005). the cruicai region is made 
up of those values of l for which r < - 1 80 (or / ' 1 80) Hence, for a 
sample value of t falling in the interval - 2 20 < t < - 1 80 or the interval l 
> 2 20, the two tests (1) //, ft » 65 grams against II, ft 65 grams and (2) 
Ht ft » 65 grams against ff, ft < 65 grams lead to different conclusions 
This points up the importance of stating (he correct alternative hypothesis 
for an a level test For example, if the diet actually does lead to an increase 
in mean weight gamed, the second lest would detect this more often 

6 4 POWER FOR THE t TEST 

Example 8.2. Find the power function for the five per cent level test 
of the null hypothesis //, jr = 65 grams against the alternative hypothesis 
H, fi<6i grams, when twelve random observations are taken from 
a normal population with unknown variance 

Dearly, the statistic to use js 


s (8 7) 

77 

and the critical region for the lest is r < — 1 80 = or v < 3 ^, 

where x, ~ t, • f/vT2 + 65 Frrflowifig an argument similar to the one 
used in denvtng Eq (6 20), we have 
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^ ^ < 65 

s s 

^/T2 -sAl 


= P 


/■ vT2 


+ yx X|) ; /x 65 


t • + /i — 65 Y — fi‘5 

V ^ ; /x < 65 


^/^2 


-s/12 


or 


or 


/?(/x; /X < 65) = P 


't + IL=^^t„;n<65 


L 


n/11 


pill-, /X < 65) = P 


% 


where 


X = and ^ = 4 

cr 11 o- 

-s/T2 


( 8 . 8 ) 


(8.9) 


From Eq. (8.9) we see that the power function depends on the two random 
variables t and each with eleven degrees of freedom. 

In general, we say that the random variable 


t + P ~ P" / /I — 1 

_a_ V X' 

V ;? 


( 8 . 10 ) 


is distributed as the uoncentral t with n — 1 degrees of freedom. Useful tables 
for finding power are given in Tables of the Non-Central t Distribution by 
ResnikolT and Lieberman, Stanford University Press, along with Refs. 
[8, 9, 10]. [It is interesting to note that the power function obtained from 
Eq. (6.20) may be written as 


Pill ; II < /i„) = 1 - /S(/i) = P 


/i < /x„ - ; P < Po 


-s/ « 


or 
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I tit J 

and that 

Jir 

IS normally distributed with mean 

vn 

and variance one] 

In general the power function for the one sided a level test of the 
hypothesis p. *= /t, against the alternative hypothesis fi < /x, with critical 
region r < z, « r , (w - 1) is given by 

p(fi P{t </, (8 11) 

where / and t are defined by Eqs (8 7) and (8 JO) respectively and 

^ = Vn =iv'» (812) 

With A s= (ji - If the alternative hypothesis is m > jXo with critical 
region r > r, » f,(n - I) the power function is 

p(li (t > ft,) - >t, tt> fx,} (8 13) 

If the alternative hypothesis is /i ^ and the critical region is defined by 
/</ ,f(n J) = / and / "> /. »(u — )) = /» 

then the power function of the two sided of level test of the hypothesis 
fi = /lo is given by 

Pin H ^ lio) - P\f < f 1 + Hr > r,] (8 14) 

where t is distributed as Che nonceniral / with n ~ 1 degrees of freedom 
given m Eq (8 10) To illustrate the nature of power curves along with 
other properties associated w»ih power Tables B 1 and 8 2 for right sided 
one tailed tests are given 

Example 8 3 Graph the power function for the five per cent level test 
in Example 8 2 and use the resulting curve to determine the power and 
probability of making the type 2 error when the true mean is 61 and the 
true variance is 16 
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Use Table 8.1 with eleven degrees of freedom. Since the critical region 
for the test is in the left tail of the t distribution, the values of X are negative. 
The required power curve is shown in Fig. 8.2. 


Table 8.1 

Values of A = (p - ho)»/~nl<r for Specified Degrees of Freedom v and 
Power p. One-sided Test with a = 0.05.* 


lEBI 

0.10 

0.20 

0.30 

0.40 

0.50 

0.60 

0.70 

0.80 

0.90 

0.95 

0.99 

1 

0.64 

1.60 

2.46 

3.35 

4.31 

5.38 

6.63 

8.19 

10.51 

12.53 

16.46 

2 

0.50 

1.15 

1.63 

2.07 

2.49 

2.92 

3.40 

3.98 

4.81 

5.52 

6.88 

3 

0.45 

1.02 

1.43 

1.79 

2.13 

2.48 

2.85 

3.30 

3.93 

4.46 

5.47 

4 

0.43 

0.96 

1.34 

1.67 

1.99 

2.30 

2.64 

3,04 

3.60 

4.07 

4.95 

5 

0.42 

0.92 

1.29 

1.61 

1.91 

2.21 

2.53 

2,90 

3.43 

3.87 

4.70 

6 - 

0.41 

0.90 

1.26 

1.57 

1.86 

2.15 

2.46 

2,82 

3 33 

3.75 

4.55 

7 

0.40 

0.89 

1.24 

1.54 

1.82 

2.11 

2.41 

2,77 

3.26 

3.67 

4.45 

8 

0.40 

0.88 

1.22 

1.52 

1.80 

2.08 

2.38 

2.73 1 

3.21 

3.62 

4.38 

9 

0.39 

0.87 

1.21 

1.50 

1.78 

2-06 

2.35 

2.70 

3,18 

3.58 

4.32 

10 

0.39 

0.86 

1.20 

1.49 

1.77 

2.04 

2.33 

2.67 

3.15 

3.54 

4.28 

11 

0.39 

0.86 

1.19 

1.48 

1.75 

2.02 

2.32 

2.66 

3.13 

3.52 

4.25 

12 

0.38 

0.85 

1.19 

1.47 

1.74 

2.01 

2.30 

2,64 

3.11 

3.50 

4.22 

13 

0.38 

0.85 

1.18 

1.47 

1.74 

2.00 

2.29 

2.63 

3.09 

3.48 

4.20 

14 

0.38 

0.84 

1.18 

1.46 

1.73 

2.00 

2.28 

2.62 

3.08 

3.46 

4.18 

15 

0.38 

0.84 

1.17 

1.46 

1.72 

1.99 

2.27 

2.61 

3.07 

3.45 

4.17 

16 

0.38 

0.84 

1.17 

1.45 

1.72 

1.98 

2.27 

2,60 

3.06 

3.44 

4.16 

17 

0.38 

0.84 

1.17 

1.45 

1.71 

1.98 

2.26 

2,59 

3.05 

3.43 

4.14 

18 

0.38 

0.83 

1.16 

1.45 

1.71 

1.97 

2.26 

2.59 

3.04 

3.42 

4.13 

19 

0.38 

0.83 

1.16 

1.44 

1.71 

1.97 

2.25 

2,58 

3.04 

3.41 

4.12 

20 

0.38 

0.83 

1.16 

1.44 

1.70 

1.97 

2.25 

2.58 

1 3.03 

3.41 

4.12 

25 

0.37 

0.83 

1.15 

1.43 

1.69 

1.95 

2.23 

2.56 

3.01 

3.38 

4.09 

30 

0.37 

0.82 

1.15 

1.42 

1.68 

1.94 

2.22 

2.54 

2.99 

3.37 

4.06 

CO 

0.36 

0.80 

1.12 

1.39 

1.64 

1.90 

2.17 

2.49 

2.93 

3.29 

3.97 


• This table is reproduced from J. Neyman and B. Tokarska, “Errors of the Second 
Kind in Testing ‘Student's’ Hypothesis,” Journal of American Statistical Association, 
Vol. 31, p. 322, Table I, with the permission of the editor of the journal. 


When fj, = 61, cr' = 16, n = 12, and fig = 65, we find that 


X = = -3,46 

4 

From Fig. 8.2 we observe that the power of the test is 


pill = 61 ; Mo = 65) = pi\ = -3.46) = 0.94 
Thus the probability of making the type 2 error is 


/9(M = 61 ; Mo = 65) = 1 - 0.94 = 0.06 
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Table 82 

Values ofA — (<i — for Specified Degrees of Freedom * and 

Power p One sided Test with a = 0 01 • 



Th/i lable « reproduced from i No-maftendB Tokanka "Erron of the Second Kind 
in Testing Student s Hypothesis Journol of Amentan Siaiistieal Assoeiaiion Vol 
31, p 322 Table 11 with the pcrmnsion of the cdiiior ot the journal 



Fig. 8 2 Power Curve for the Left Tailed One Sided 5% Level r Test of 
)i = When e = 2 II and oo A = (ji — (ij) •/Ulir 
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This means that for a normal population with true mean 61 and true standard 
deviation of 4 we would fail to reject the null hypothesis p. = 65 only six 
per cent of the time in repeated experiments using random samples of size 
twelve. 

The term “fail to reject the null hypothesis” may seem strange to the 
reader. It might appear that when a sample value falls in the noncritical 
region the term “accept the null hypothesis” is more appropriate, since we 
“accept the alternative hypothesis” otherwise. Two points may be made 
in this connection. First, when we accept the alternative hypothesis, we 
know (or can find) the probability of making an error, but if we “accepted 
the null hypothesis” we would not know the probability of making an error 
without first knowing the true mean and true variance in Example 8.3, say. 
(But if we knew the true mean, we would have no need to test a hypothesis 
about the mean in the first place.) Second, the test procedure concerns the 
“rejection” of the null hypothesis. Thus, if one fails to reject the null hy- 
pothesis, this simply means that there is not enough evidence, as a result 
of the experiment, to say what the true value really is — it could be the value 
assumed in the null hypothesis or some other value. That is, the statistical 
test may lead to the conclusion “reserve judgment.” 

But anyone who performs experiments knows that, due to practical 
considerations, one cannot always reserve judgment when a sample value 
falls in the noncritical region. There are situations in which the statistical 
conclusion “fail to reject the null hypothesis” actually means that the 
experimenter “accepts the null hypothesis.” That is, the experimenter 
concludes that the difference in the sample statistic and the hypothesized 
value of the parameter is small enough to be of no “practical significance.” 
Thus, when an experimenter accepts the null hypothesis, he does so out 
of practical considerations rather than statistical considerations. 

Tables 8.1 and 8.2 may be used to find power curves or operating char- 
acteristic curves for either left-tailed or right-tailed one-sided five per cent 
level or one per cent level tests. Curves for two-tailed tests may be obtained 
by using Eq.(8.14) and Ref. [11], Also, these tables may be used to find 
the minimum sample size in an ct level test which insures that ^ does not 
exceed a specified value for a particular value of X. Table E on pp. 606- 
607 of Design and Analysis of Industrial Experiments by Davies, Hafner 
Publishing Co., may be used to find the sample size directly. This is illus- 
trated in the following example. 

Example 8.4. In the rat diet experiment (Example 8.1), suppose //„: 
p, = 65 grams and 65 grams. Further, suppose that it is known 

from past experience that cr = 4 grams and that a mean weight gain differing 
less than three grams from 65 grams does not appreciably effect future per- 
formance of the rats. What sample size should be taken if the probabilities 
of error of the first and second kind is not to exceed 0.05? 

Letting 8 denote the difference it is important to detect, we have 
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^ Sftr = { = 0 75 Thus, for a two sided test with a = ff = 005, we 
find that n = 26 

a 5 TOIERANCE l/M/TS 

Earlier we learned how to find two confidence limits which determine 
a confidence interval for a population parameter That is, we learned how 
to find limits such that m repeated expenments of the same size a certain 
proportion of intervals determined by these two limits cover a population 
parameter In some experiments it is not enough to find an interval which 
1007 per cent of the time covers a parameter value Instead, we require 
an interval which covers a portion of the popuiation of \alues with 
a specified confidence, say 7 Such intervals are called tolerance intervals, 
and their endpoints are called tolerance Umttt 

For example, in the rat diet expenment of Example S I we may wish 
to find a 95 per cent tolerance interval for 90 per cent of (he weights closest 
to the mean of a normal population of weights It is obvious that the 95 
per cent tolerance interval which covers 90 per cent of the values specified 
must be longer than the 95 per cent confidence interval which covers the 
single value p Since the 95 per cent confioence limits of p may be expressed 
as 

•* ± (»«•*» ± k'S 

where 



we determine tolerance limits from 

X±K s (8 15> 

where K is determined so that the interval covers 90 per cent [P) of the 
population of values with confidence 95 per cent (7) Values of K are given 
in Table 8 3 Thus for 7 = 0 95. = 0 90. and n = 12, we find A" = 2 655 

Since ;? = 60 25 and s = ^^16 38 = ^05, the 95 per cent tolerance limits 
are 60 25 ± (2 655K4 05J or 49 50 and 71 00 Clearly the interval from 
49 50 to 71 00 IS considerably longer than the 95 per cent confidence interval 
given by Eq (8 6) Further note that 100 per cent of the weights in the 
sample of Example 8 I fall in the 95 per cent tolerance interval 

8 6 COMPARISON OP MEANS OF TWO SETS OF INDEPENDENT 
OBSERVATIONS 

Problems often occur in practice whitdi involve the comparison of 
two population means If random samples Xi, , x,, and Xj,, , x^., 



Table 8^ 


Tolerance Factors for Normal Distributions* 



7 = 0.75 


7 = 

0.90 




0.75 

0.90 

0.95 

0.99 

0.999 

0.75 

0.90 

0.95 

0.99 

0.999 

2 

4 498 

6.301 

7.414 

9.531 

11.920 

11.407 

15.978 

18.800 

24.167 

30.227 

3 

2.501 

3.538 

4.187 

5.431 

6.844 

4.132 

5.847 

6.919 

8.974 

11.309 

4 

2.035 

2.892 

3.431 

4.471 

5.657 

2.932 

4,166 

4.943 

6.440 

8.149 

5 

1.825 

2.599 

3,088 

4.033 

5.117 

2.454 

3.494 

4.152 

5.423 

6.879 

6 

1.704 

2.429 

2.889 

3.779 

4.802 

2.196 

3,131 

3.723 

4.870 

(>.188 

7 

1.624 

2.318 

2.757, 

3.611 

4.593 

2.034 

2,902 

3.452 

4.521 

5.750 

8 

1.568 

2.238 

2.663 

3.491 

4.444 

1.921 

2,743 

3.264 

4.278 

5.446 

9 

1.525 

2.178 

2.593 

3.400 

4.330 

1.839 

2.626 

3.125 

4.098 

5.220 

10 

1.492 

2.131 

2.537 

3.328 

4.241 

1.775 

2 535 

3.018 

3.959 

5.046 

11 

1.465 

2.093 

2.493 

3.271 

4.169 

1.724 

2.463 

2.933 

3.849 

4.906 

12 

1.443 

2.062 

2.456 

3.223 

4.110 

1.683 

2,404 

2.863 

3.758 

,4.792 

13 

1.425 

2.036 

2.424 

3.183 

4.059 

1.648 

2.355 

2.805 

3.682 

'4.697 

14 

1.409 

2.013 

2.398 

3.148 

4.016 

1.619 

2.314 

2.756 

3.618 

4.615 

15 

1.395 

1.994 

2.375 

3.118 

3.979 

1.594 

2.278 

2.713 

3.562 

4.545 

16 

1.383 

1.977 

2.355 

3.092 

3.946 

1.572 

2 246 

2.676 

3.514 

4.484 

17 

1.372 

1.962 

2.337 

3.069 

3.917 

1.552 

2.219 

2.643 

3.471 

4.430 

18 

1.363 

1.948 

2.321 

3.048 

3.891 

1.535 

2.194 

2.614 

3.433 

4.382 

19 

1.355 

1.936 

2.307 

3.030 

3.867 

1.520 

2.172 

2.588 

3.399 

4.339 

20 

1.347 

1.925 

2.294 

3.013 

3.846 

1.506 

2.152 

2.564 

3.368 

4.300 

21 

1.340 

1.915 

2.282 

2 998 

3.827 

1.493 

2.135 

2.543 

3.340 

4.264 

22 

1.334 

1.906 

2.271 

2.984 

3.809 

1.482 

2.118 

2.524 

3.315 

4.232 

23 

1.328 

1.898 

2.261 

2.971 

3.793 

1.471 

2.103 

2.506 

3.292 

4.203 

24 

1.322 

1.891 

2.252 

2.959 

3.778 

1.462 

2.089 

2.489 

3.270 

4.176 

25 

1.317 

1.883 

2.244 

2.948 

3.764 

1.453 

2.077 

2.474 

3.251 

4.151 

26 

1.313 

1.877 

2.236 

2.938 

3.751 

1.444 

2.065 

2.460 

3.232 

4.127 

27 

1.309 

1.871 

2.229 

2.929 

3.740 

1.437 

2.054 

2.447 

3.215 

4.106 

30 

1.297 

1.855 

2.210 

2.904 

3.708 

1.417 

2.025 

2.413 

3.170 

4.049 

35 

1.283 

1.834 

2.185 

2.871 

3.667 

1.390 

1.988 

2.368 

3.112 

3.974 

40 

1.271 

1.818 

2.166 

2.846 

3.635 

1.370 

1.959 

2.334 

3.066 

3.917 

45 

1.262 

1.805 

2.150 

2.826 

3.609 

1.354 

1.935 

2.306 

3.030 

3.871 

50 

1.255 

1.794 

2,138 

2.809 

3.588 

1.340 

1.916 

2.284 

3.001 

3.833 

55 

1.249 

1.785 

2.127 

2.795 

3.571 

1.329 

1.901 

2.265 

2.976 

3.801 

60 

1.243 

1.778 

2.118 

2.784 

3.556 

1.320 

1.887 

2.248 

2.955 

3.774 

65 

1.239 

1.771 

2.110 

2.773 

3.543 

1.312 

1.875 

2.235 

2.937 

3.751 

70 

1.235 

1.765 

2.104 

2.764 

3.531 

1.304 

1.865 

2.222 

2.920 

3.370 

75 

1.231 

1.760 

2.098 

2.757 

3.521 

1.298 

1.856 

2.211 

2.906 

3.712 

80 

1.228 

1.756 

2.092 

2.749 

2.512 : 

1.292 

1.848 

2.202 

2.894 

3.696 

85 

1.225 

1.752 

2.087 

2.743 

3.504 

1.287 

1.841 

2.193 

2.882 

3.682 

90 

1.223 

1.748 

2.083 

2.737 

3.497 

1.283 

1.834 

2.185 

2.872 

3.669 

95 

1.220 

1.745 

2.079 

2.732 

3.490 

1.278 

1.828 

2.178 

2.863 

3.657 

100 

1.218 

1.742 

2.075 

2.727 

3.484 

1.275 

1.822 

2.172 

2.854 

3.646 

110 

1.214 

1.736 

Z069 

2.719 

3.473 

1.268 

1.813 

2.160 

2.839 

3.626 

120 

1.211 

1.732 

2.063 

2.712 

3.464 

1.262 

1.804 

2.150 

2.826 

3.610 

130 

1.208 

1.728 

2.059 

2.705 

3.456 

1.257 

1.797 

2.141 

2.814 

3.595 

140 

1.206 

1.724 

2.054 

2.700 

3.449 

1.252 

1.791 

2.134 

2.804 

3.582 

150 

1.204 

1.721 

2.051 

2.695 

3.443 

1.248 

1.785 

2.127 

2.795 

3.571 

160 

1.202 

1.718 

2.047 

2.691 

3.437 

1.245 

1.780 

2.121 

2.787 

3.561 

170 

1.200 

1.716 

2.044 

2.687 

3.432 

1.242 

1.775 

2.116 

2.780 

3.552 

180 

1.198 

1.713 

2.042 

2.683 

3.427 

1.239 

1.771 

2.111 

2.774 

3.543 

190 

1.197 

1.711 

2.039 

2.680 

3.423 

1.236 

1.767 

2.106 

2.768 

3.536 

200 

1.195 

1.709 

2.037 

2.677 

3.419 

1.234 

1.764 

2.102 

2.762 

3.529 

250 

1.190 

1.702 

2.028 

2.665 

3.404 

1.224 

1.750 

2.085 

2.740 

3.501 

300 

1.186 

1.696 

2.021 

2.656 

3.393 

1.217 

1.740 

2.073 

2.725 

3.481 

400 

1.181 

1.688 

2.012 

2.644 

3.378 

1.207 

1.726 

2.057 

2.703 

3.453 

500 

1.177 

1.683 

2.006 

2.636 

3.368 

1.201 

1.717 

2.046 

2.689 

3.434 

600 

1.175 

1.680 

2.002 

2.631 

3.360 

1.196 

1.710 

2.038 

2.678 

3.421 

700 

1.173 

1.677 

1.998 

2.626 

3.355 

1.192 

1.705 

2.032 

2.670 

3.411 

800 

1.171 

1.675 

1.996 

2.623 

3.350 

1.189 

1.701 

2.027 

2.663 

3.402 

900 

1.170 

1.673 

1.993 

2.620 

3.347 

1.187 

1.697 

2.023 

2.658 

3.396 

1000 

1.169 

1 671 

1.992 

2.617 

3.344 

1.185 

1.695 

2.019 

2.654 

3.390 

oo 

1.150 

1.645 

1.960 

2.576 

3.291 

1.150 

1.645 

1.960 

2.576 

3.291 


Reproduced from C. Eisenhart, M. W. Hastay, and W. A. Wallis, Techniques of 
Siaiistical Analysis. New York: McGraw-Hill. Inc. Chap. 2, Table 2.1, pp. 102- 
107, with permission of the publisher. 
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of sizes n, and are independently drawn from normal populations with 
means Pi and pj, respectively, and common unknown variances er\ = crl 
= cr^ then the / distribution with «, + /ij — 2 degrees of freedom can be 
applied in obtaining confidence intervals for and testing hypotheses of the 
difference 8 = p, - Pr If in Theorem 8.4 we let 


d = X, — X, and Sa = Jsl f— 4- -j-") 

V \ /7j / 

the t statistic in Eq. ( 8 . 2 ) becomes 


t = ( 8 . 16 ) 

Sg 

where 8 = Pi - Ps- From Eq. ( 8 . 16 ) we can obtain confidence intervals 
for the difference in means 8 in exactly the same way as we did for the single 
mean p with unknown variance by replacing x by d, (i by 8, by $ 1 , \jn 
by (I/n, + I/n,), and the degrees of freedom « — 1 by n, + «2 — 2 . Thus, 
the symmetric 100 (1 — a) per cent confidence limits for 8 are given by 

d ± taii{nx + - 2)‘Sp ^ ( 8 . 17 ) 

The test of the hypothesis 8 = So against 8 9^= 8o is similar to that given in 
Example 8 . 1 . In order to bring out certain differences in the test procedures 
with one and two means we illustrate with a one-sided 1 test. 

Example 8 . 5 . The mean breaking strength of a product made under 
standard conditions is to be compared to that of the same product made 
under different (new) conditions. (The reader may think in terms of compres- 
sive strength of bricks made in a laboratory, the tensile strength of yarn 
made by A, or the resistance of a sheet of metal to a certain force.) Assume 
that random samples of size ten had the following (coded) means and 
variances; = 8 . 371 b, 9.62 lb, sf = 1 . 32 , and 55 = 1 . 18 . The 

standard is denoted by the subscript 1. Is the mean breaking strength under 
new conditions greater than that under standard conditions? 

Using the general procedure for testing hypotheses, we have: 

1. //„: p, = p, (or 8 = p, - P2 = 0) and //„: p, < p, (or 8 = p, - 
P2 < 0), since we are interested in knowing if the new conditions 
lead to greater breaking strength. 

2. Assume that samples are independently drawn from two normal 
populations with common unknown variance cr=. Let the significance 
level be a 0.01, n, = n, = 10. 

3 . The test statistic is 
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j _ (i. - Xii -()•■>- th) ^ X, - X, _ 3 1 


With 18 degrees of freedom 

4 Since the e^iperimenter is inlcrcsted in detecting increased breaking 
strength, that is, negative difference S. we take as the cntical region 
all those values of / for which r< — 3 20 

5 Since = 837- 962= -1 251b and 

/IT /r9(l32)+“9fi r8) 

" - Vt “ vt 10 + 10^ 1 “ 

the calculated t statistic is 


6 The sample statistic r, falls in the noncritical region Therefore, we 
do not have enough evidence to say that the breaking strength under 
the new conditions is greater than that under standard conditions 
When the population variances are unequal and unknown the methods 
just described for finding confidence intervals and testing hypotheses in> 
volvmg differences of two means are not appropnate An approximate 
solution to the problem of unequal variances, known as the Behrens*Fisher 
problem, is described below and is given by Welch {12] and Aspm [2] 

If we let fj =s jJ/Hi and r}. = sUn,. it can be shown that 


is approximately distributed as / with p degrees of freedom, where 


The number of degrees of freedom computed by Eq (8 19) is likely not 
to be a whole number In this case it should be rounded off to the nearest 
integer when used in determining confidence limits or critical regions 
Even jf the computed p is an rnteger the /' statistic is not distributed as any 
l statistic In some cases it may be better to replace the /' statistic by some 
other, more appropriate, approximate statistic (as illustrated below) 

The ratio in Eq (8 18) can be used as the I ratio to find approximate 
confidence limits and to make approximate tests of hypotheses For example, 
symmetnc 100 (1 — a) per cent confidence limits are given by 
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+ ( 8 . 20 ) 

where tan{v) is the value of t obtained from Table VI for the integral 
degrees of freedom given by Eq. (8.19), and the null hypothesis H^'. S = So 
can he tested against the alternative hypothesis Ha'. S ^ So by rejecting 
Ho if the computed statistic t'c is greater than 1 1at,(v) |. Actually, exact values 
of the five per cent and one per cent points of t' are given by Aspin [3], but 
the approximate corresponding t values are generally satisfactory where 
the t' statistic applies. 

If the population variances are unknown, an experimenter seldom 
knows whether they are equal or unequal. Thus, the methods described 
for equal variances are usually applied without knowing for certain that 
the variances are equal. However, this is not necessarily bad, for Box [4] 
found that no serious consequence will result if the population variances 
are only moderately different and the two sample sizes are equal. This 
means that the methods described for equal variances may be used as ap- 
proximations in place of the approximate t' in some cases where /' is con- 
sidered appropriate. 

It is informative to note how Theorem 8.4, which involves two means, 
IS related to earlier theorems. We may write Eq. (8.3) as 

- («i - 1)^ + («s - 1)4 

” w, -f «2 — 2 

Hence, from Eq. (7.8) we see that 

(8.21) 

O' 1 

is distributed as x"" with «, + «,- 2 degrees of' freedom. Further, from 
Theorem 6.6, we know that d = is normally distributed with mean 

S = /i) — fij and variance 



Thus 


■^1 (f^i P-j) d — S 

X. (J_ + ±'1 5- (8.22) 

V \/7, n^j 

is normally distributed with zero mean and unit variance. According to 
Theorem 7.3, x, and 5f as well as and s] are independent random 

variables. Thus, it follows that d and 4 are independent random variables 
and that 
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£zJ. and ("t + "■ - (8 23) 

ffi f 


are also independent random vanables Hence, according to Theorem 
8 1, the ratio 


V n, 4^^ - 2 


(8 24) 


IS distributed as the Student T with n. + n, — 2 degrees of freedom 

If there is evidence that two population variances ef\ and (p| are not 
equal, it is possible m some cases, by pairing observations, to use exact 
methods to examine the difference in the population means This very 
important but special case is described in the next section 


8 7 PAino OSSSRVATIONS 

Due to extraneous causes, it may happen that two means are declared 
different when there is actually no difference in the means The reverse 
may also occur, that is, one may fail to recognize real differences because 
of the appearance of factors other than those of interest For example, 
we may wish to compare two methods of determining starch content of 
potatoes If several potatoes have very different starch content, and Method 
1 1 $ used on potatoes with low siarch content whereas Method 2 is used 
on potatoes with high starch content, we could conclude that Method 2 
is superior to Method I when this is not the case In another-Mample, we 
may wish to compare two analysts in their ability to measure the percentage 
of ammonia jn a gas used »n a rertam manufacturing process Unless the 
analysts measure the percentage of ammonia at the same time and place 
on the same number of days, any comparison of the abilities of the analysts 
IS likely to be meaningless 

In each of these examples, it is clear that paired observations should 
be made In the starch experiment each potato should be cut into two parts, 
forming a pair The decision as to which method to apply to one part of 
each pair could be made by tossing a com In the ammonia experiment the 
analysts should lake samplesat the same time and in roughly the same place, 
the exact place being determined by the toss of a com, say In each case 
we try to make sure that two members of a pair are alike in all respects 
Then when the observations resulting from the experiment are made, we 
may ascribe any difference, except for random variation to the factor we 
are trying to measure 
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Let x,_, and denote th^ observed values of the j th pair of a set of 
n paired values. Assume that the difference dj = — x^ represents a 

sample of n random observations from a normal population with mean 
8 and variance Denote the sample mean and variance by d and 55 , 
respectively. If there are extraneous factors affecting the y'th pair, we 
assume that they affect both observations of the pair in exactly the same 
way and that subtraction removes the effect. With these assumptions it 
follows that 


d- S 

Sg 


(8.25) 


is distributed as t with n — 1 degrees of freedom. An illustration of the test 
procedure for the hypothesis 8 — 0 is given in the following example. 

Example 8 . 6 . Two methods of measuring the percentage of starch in po- 
tatoes are to be compared; the data given by C. von Scheele, G. Svensson, 
and J. Rasmusson in “Om Bestamning av Potatisens Starkerlse och Torrsubs- 
tanshalt med Tillhjalp av dess specifika Vikt,” Nordisk Jordbrugsforskning, 
1935, p. 22, are to be used. Sixteen potatoes with very different starch 
content were taken, and the two methods of measurement were applied to 
each potato. 

Using the general procedure for testing hypotheses, we have: 

1 . //„; 8 = 0 against //„; 8^0 is used, since we wish to know if the two 
methods are different. Note that the mean difference S is also the 
difference in the means, p-i and /Xj, of the two populations of measure- 
ments, i.e., 8 = p., — p. 2 . 

2. Assume that the samples of differences are randomly drawn from a 
random population with mean 8 = 0 and unknown variance o-J. Let 
the significance level be a = 0.05. 

3. The test statistic is 


t = ^ 

V n 

/ 

with 15 degrees of freedom. 

4. The critical region is made up of all those values of t for which 
t < -2.13 or t > 2.13. 

5. The data in the percentage of starch experiment are given in Table 8.4, 

along with the differences d, = - x,;. Since = 1.2 and 

2d? = 0.52, d ~ 0.075, Sa = 0.17, and the computed t value is 
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_ (0075)4 

-~mr 


= i 7 


6 But te fzils to /all m the critical rej^ion Thus, we fail to reject the null 
hypothesis that there is no dilTerence m the two mthods of measuring 
the percentage of starch m potatoes 


Table 8 4 


Fofaro 

Numbfr 

Pereentatt SlOreh 

Dtffertnet 

(4) 

Method t (Xi) 

Method 2 (xi) 

1 

217 

21 3 

02 

2 

187 

18 7 

00 

3 

18) 

18) 

00 

4 

17 3 

17 4 

01 

i 

183 

18) 

02 

6 

136 

13 4 

02 

7 

170 

167 

0) 

g 

(66 

169 

-OJ 

9 

140 

1)9 

01 

10 

17 2 

170 

02 

11 

217 

2f 4 

03 

12 

116 

186 

00 

t3 

179 

180 

-01 

14 

17 7 

176 

01 

16 

18) 

183 

-02 

16 

13 6 

13 3 

01 


The paired comparisons test is often used when one is comparing two 
measurements on the same individual or object Thus, the experimental 
design consists in taking n individuals, making a measurement on each one, 
and then after some treatment making a second measurement m the same 
unit as the first The point rs to dciennmc if any real difference, on the 
average, occurs as a result of the treatment Thus, the difference in mcas* 
urements for each individual is obtained and these differences constitute a 
sample which is used to test the hypothesis that there is no real difference 
Clearly, the test is designed to measure any effect the treatment might have 
There are offen obvious differences from individual to individual which do 
not affect the test 

It should be observed that the set of differences resulting from pairing 
may be treated in the same way as a set of observations in Sects 8 2, 8 3, 
and 8 4 That is, confidence limits, tests of hypotheses size of the type 2 
error power, and size of sample may be established in exactly the same 
way. since the assumptions concerning the d's are the same as those about 
the ic's 

However, it « not always possible to think only in terms of the sample 



SECT. 8.8 


SAMPLING — ^THE STUDENT t DISTRIBUTION 


269 


differences. For example, when testing /Xi = (or S = — Ms = 0) at 

the a level we may wish to compare the paired t test with the two 
independent samples t test given in Sect. 8.6. We have already noted' the 
most important advantage of the paired t test in case extraneous factors 
exist — namely, that the test for paired observations is based only on the 
variation in the differences, so that other variations which have exactly 
the same effect on both members of a pair do not affect the meashre of 
sample standard deviation as in the two independent samples 

t test. 

In order to make further comparisons of the two test procedures, the 
usual assumptions are that and x^jij = 1, 2, . . . ,w) are observations 
from normal populations with means fit and M 2 aiid variances cf and eri, 
respectively. For the paired f test we do not assume that and Xsj are 
independent nor that the variances are equal, as in the case for the two 
independent samples t test. Actually, we do not need to assume that the 
variances o-f and trl remain constant throughout the experiment All we 
need assume is that the sum tr, + remains constant for the n paired 
observations. 

On the other hand, if there are no extraneous factors in an experiment, 
the two independent samples t test is to be preferred, since it is more powerful 
than the paired t test. This is so because the number of degrees of freedom, 
2n — 2, for the independent samples test is twice as great as the number of 
degrees of freedom. « — 1, for the paired t test. The difference in power 
(or probability of the type 2 error) is considerable for small n, but is small 
when the sample size is moderately large, say n = 12. 


8.8. THE SIGN TEST 

In light of the discussion in Seel. 2.5 concerning rounding-off errors, 
the reader might well be disturbed by the appearance of only one significant 
figure in the column of differences in Example 8.6. This practice should 
be avoided if possible. However, there are other, less powerful tests which 
can be applied in cases where there is only one significant figure and the 
experimenter has some reservations about using the appropriate paired 
t test because relative errors in magnitude can be quite large. These tests 
do not depend on the size of numerical values of the differences, but only 
on their sign or rank order. Further, the assumption of normality is not 
required in order that these new tests be valid. We illustrate how the sign 
lest of the hypothesis S = 0 may be applied to the data of Example 8.6. 
This test does not even require that the variance a-j + a-'i remain constant 
from pair to pair and is one of the simplest of all tests to apply. 

Example 8.7. Use the sign test and the data of Example 8.6 to test the 
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hypothesis that the two methods of measuring the percentage of starch 
in potatoes are the same 

The total number of minus signs is A, c: 0, and the total number of 
plus and minus signs is n = 13 (if we ignore the three zeros) For the five 
per cent two sided test we need two or fewer minus signs, according to 
Table S 5, before we can reject the hypothesis 5=0 Since = 3 is greater 
than A: = 2, we fail to reject the hypothesis 5 = 0 and reach the same 
conclusion as m Example 8 6 


Table 

Critical Values of k for the Sign Test 
{Table fives largest integral values of k such that Flir ^k] < a/2, 
where x has the binomial distribution with p ) 1 
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we expect roughly half of the differences to be negative. If a small propor- 
tion of the differences are either positive or negative, we suspect that there 
might be a real difference in the methods of determination of starch. We 
wish to determine just how small the proportion should be before we say 
the methods are different. Since this proportion depends on the sample 
size, we find the number of positive or negative signs, whichever is smaller, 
necessary for the rejection of the hypothesis S = 0 at the a level. 

In general, if we assume any pair of values (xi^, Xjj) O’ = 1, • • • , «) 
to be randomly drawn from the same distribution, we expect dj = x^ — x^j 
to be positive half the time and to be negative half the time in repeated sam- 
ples. That is, the null hypothesis is that the difference d, has a distribution 
with median zero or, which is the same thing, true proportion of positive 
(or negative) signs equal io p — If we think only in terms of the signs, this 
means that the + and — signs have a dichotomous distribution with 
p = Thus, regardless of the nature of the distribution from which the 
jth pair is drawn, we expect the + and — signs to have a dichotomous 
distribution with p = \. Hence, in n independent trials in which a positive 
or negative sign for each pair is determined and for which the probability 
of a positive (negative) sign on each trial is p = the probability of x 
positive (negative) signs is given by the binomial density function 

= ( 4 .)" (8.26) 

Table 8.5 gives the critical value k, an integer, such that 

p[x < k] = i: 6(x) < -f 

Thus, in testing the null hypothesis p = ^ against p ^ the significance 
level of the test is actually less than the level indicated in Table 8.5. This 
is due to the fact that the binomial is a discrete distribution. Note that 
the density function //x) from which the pair of observations x,j and x^j 
is drawn is generally continuous. 

The hypotheses p ~ ^ and 3 = /i| — /Zj = 0 are equivalent, and the 
alternative hypothesis 3 9 L 0 is equivalent to p ^ For a one-sided test 
the alternative hypothesis is /)< 4 (/? > 4 ) or its equivalent 3 < 0 (or 
S > 0). In this case, for the significance level a we enter Table 8.5 in the 
column headed by la, since only half of the probability indicated is in the 
lower tail (upper tail) of the binomial distribution. 

8.9. COMPARISON OF THE t TEST AND THE SIGN TEST 

The conclusions resulting from the t test and the sign test of the null 
hypothesis 3 = 0 against the alternative hypothesis Ha are not always the 
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same The t test is to be preferred to the sign test, provided all assumptions 
for the paired t test hold, since it is more powerful However, some of the 
assumptions for the paired t test may not be valid Jn this case, the sign 
test can generally be applied, since all one really need assume is that each 
pair of values be drawn randomly from the same population (populations 
may differ from pair to pair) 

It IS informative to compare the power of the sign test and the paired 
t test of the null hypothesis i = 0 against the alternative hypothesis 5 5* 0 
under assumptions which make both tests valid We make the comparison 
in terms of the so-called poner efficiency, I00n,//i of the sign test relative 
to the / test That is, under the assumption that random paired values are 
drawn from normal populations the power efficiency in percentage is the 
ratio lQOit,ln, where n, is the sample size for a paired l test which gives the 
same power as a sign test based on a sample of size n The power efficiency 
for the sign test decreases (1) with locfcasmg sample size; (2) with increasing 
I S t, and (3) with increasing o! if random pairs are drawn from two normal 
populations with means fi, and p* and common variance the power 
efficiency of the sign test 1$ given in Table 8 6 where A * |$l/(N/?«r). 

IS the standard deviation of a difTerence of two observations Thus, 
according to Table 8 6, if the true means are ten and four and the variance 


Ttbte S« 

Power Efficiency of Sign T«i Rebtrve to t Test lor Normal PopolaiioM* 


1 

- 

L _ . 

HearO 

9 

10 

IS 

20 

s 

0625 

96 

96 

95 

93 

91 

10 

0020 

94 

92 

90 

87 

84 

10 1 

0215 

«5 

84 

82 

80 

77 

10 { 

1094 

77 

76 

74 

72 


20 1 

0118 

76 

75 

73 

70 


20 1 

0414 

73 

72 

70 

68 


20 

1153 

TO 

69 

67 

65 



« 

1 100(2;«) 63 7 





Reproduced from W J Di^on tixl F J Massey Introduction to Stansiieol Analysu 
Snded New Yoii McGmwHH Jne p 285 Table 17 J with permission of the 
publisher 


IS eight, then for a 2 IS per cent level test with n = 10 we find 


10-4 . 

■ TTVT" * ^ 


and 100n,/l0 = 80 or n, = 8 That is under the conditions indicated, a sign 
test with ten random differences has about the same power as a paired ( 
test with eight random differences Further, if n is large and A is not large, 
the f test requires approximately 64 per cent as many observations as the 
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sign test in order to have the same power. 

The above statements are based on tests in which no differences are 
zero (i.e., no ties in pairs). Actually, when data are drawn from continuous 
distributions, ties should not occur. However, this is not the case in practical 
work, due to limitations of measuring instruments and rounding off. In 
such cases, ties can either not be counted, thus decreasing the sample size, 
or can be counted as half plus or half minus without seriously affecting 
the significance test, provided the proportion of ties is not too large. In 
Example 8.7 ties were not counted. 

The sign test can be modified and extended to include many other 
problems. We shall return to its use in relation to other nonparametric 
tests in Chap. 16. Some of the tests of Chap. 16 which take into account 
the rank order of the observations might also be applied in place of the 
paired t test and sign test. 


8.10. EXERCISES 

8.1. On the examination in a certain course 16 students had a mean grade of 
79 and a standard deviation of 8. (a) Use a five per cent level test to de- 
termine whether there is reason to believe that the true mean is greater 
than 75. (b) Find a 95 per cent confidence interval for the true mean. 

8.2. The bacteria content of a food product must be less than 62.0 to be 
acceptable. A sample of nine cans from a lot of the product has a mean 
of 62.5 and a standard deviation of 0.3. (a) Should the lot be rejected 
on the basis of the sample evidence? Use a five per cent level test, (b) 
Find a 90 per cent confidence interval for the true mean. 

8.3. (a) Use a five per cent level two-sided test to determine if the random 
sample with measurements 

55 42 52 61 76 50 56 56 38 71 
could have been taken from a normal population with mean 50. (b) Find 
a 95 per cent confidence interval for the true mean, (c) Find the power 
function for the five per cent level one-sided test of the null hypothesis 
H^: /i = 50 against the alternative hypothesis Ha'. /^ > 50, when ten 
random observations are taken from a normal population. 

8.4. (a) A lot of rolls of paper is acceptable for making bags for grocery 
stores if its mean breaking strength is not less than 40 lb. A random 
sample of 20 pieces of paper from the lot had a mean breaking strength 
of 39 lb with a standard deviation of 2.4 lb. Should the lot be rejected 
if a = 0.1? (b) Draw the power curve for the test in (a). What is the 
chance of nonrejection of a lot which has true mean breaking strength 
of 39 lb? (c) Draw on the same graph with (b) the power curve for the 
test in (a) when the sample size is ten rather than 20. Use these two 
curves to make a statement about the effect of the sample size on the 
lest if ^ < 0.05. (d) Suppose that the specification that the true mean 
breaking strength be 40 lb allows the true mean breaking strength to 
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be as small as 39 6 lb wuhout rejection of the lot For the one per cent 
level test that the mean breaking strength not be smaller than 40 lb 
determine the sample size so that for a sample mean of ^ = 39 6 the 
probability of the type 2 error B at most 6 10 

8 5 (a) Fmd the densiiy function for the t distribution when *■ = 2 (b) 
Graph the curve for the disiribulion in (a) (e) Prove that the mean is 
zero for the distribution in (a) 

$6 Prove Eq (8 14) 

6 7 Find a 90 per cent tolerance interval for 75 per cent of the grades closest 
to the mean m Exercise 8 ) 

8 8 Find a 99 per cent tolerance interval for 95 per cent of the pieces of paper 
closest to the mean m Exercise 8 4 

8 9 Random samples are drawn from two normal populations with (he 
same variance Twenty observations in sample one have mean S, 46 
and variance = 120 Eighteen observations in sample two have mean 
Sj 39 and variance s* - 180 (a) Is there a significant difference 
between the two sample means'* Use a five per cent level test (b) Find 
a 90 per cent confidence interval for the difference 
810 Two manufacturers A and D make the same gauge of copper wire 
The measurements of tensile strength of random samples after 50001b 
has been subtracted from each are given in Table 8 7 

Table 87 


A 1(0 90 120 11$ 10$ $0 7$ 8$ 

B 130 SO 40 4$ 4$ 120 $0 


Fmd a 95 per cent confidence interval for the difference m true means 
8 II Ten albino rats are used to study the effectiveness of carbon tetrachloride 
as an antihelmmihic Each rat received an injection of 500 fi^ipposerongylui 
muris larvae After eight days the rats were divided into two groups 
and each rat received via a stomach lube a dose of carbon tetrachloride 
dissolved in mineral oil Each rai in one group received a dose or0 032cc 
and those m the other group a dose of 0063cc Two days later the 
rats were killed and the adult worms were recovered and counted 
Table 8 8 shows the number of adults recovered from each rat 

Table 88 


0 032 cc 421 462 400 37$ 413 

i TOT Yi *>2 74 11b 


Use Eq (8 1 8) to find a 95 jscr cent confidence interval for the difference 
in effectiveness between the two doses [Source Whitlock and Bliss 
“A Bioassay Technique for Antihelminthics The Journal of Parasitology, 
Vol 29 (1943) pp 48-58 ] 
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8.12. If a machined part of a certain sort is accurate to within ±0.1 in. of 
specification, it can be used. Deviations from specification of a random 
sample of 12 such parts were as follows: —0.03, —0.01, +0.02, —0.01, 
+0.06, +0.04, -0.05, +0.03, +0.02, -0.06, -0.02, +0.01. What 
proportion of the population sampled can one be 90 per cent confident 
of being between —0.1 and +0.1 of specification? 

8.13. Thirty-six boys in the same class in high school were divided into 18 
pairs of almost equal I. Q. One member of each pair was randomly 
selected and assigned to group (7|. The remaining 18 members were 
assigned to group Gj. Both groups were taught mathematics by the 
same instructor, but different methods of instruction were used. At the 
end of the semester all students were given the same examination with 
the following resulting grades 


Table 8.9 


Pair No. 


2 

3 

4 

5 

6 

7 

8 

9 

Method 1 

D 

59 

56 

94 

84 

81 

66 

o 

59 

Method 2 

H 

71 

52 

68 

68 

85 

79 

H 

64 


Pair No. 

10 

11 

12 

13 

14 

15 

16 

17 

18 

Method 1 

56 

88 

88 

75 

75 

72 

81 

84 

78 

Method 2 

39 

77 

83 

62 

74 

74 

83 

73 

70 


(a) Test the hypothesis that both methods are equally suited to the 
instructor. Use a five per cent level t test, (b) Test the hypothesis in 
(a), using the sign test, (c) Find a 90 per cent confidence interval for 
the difference in means of the two methods. 

8.14. Each of 16 samples of a material is divided into two equal parts. A 
standard analysis, ^4,, is applied to one half of each sample, and a new 
analysis, A^, is applied to the other half in order to determine the per- 
centage of a certain mineral. The percentages are 


Table 8.10 


Sample No. 

1 

2 

3 

4 

5 

6 

7 

8 

Analysis 
Analysis A^ 

m 

31.52 

31.57 

34.04 

34.08 

24.30 

24.32 

26.48 

26.38 

23.95 

23.93 

27.63 

27.63 

25.11 

25.20 


Sample No. 

9 


11 

12 

13 

14 

15 

16 

Analysis /I, 
Analysis A. 

27.90 

27.99 

22.20 

22.23 

27.62 

27.66 

24.44 

24.47 

28.22 

28.31 

32.52 

32.45 

26.12 

26.27 

22.83 

22.83 
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(a) T«t at the five per cent level to determine vnhether the new method 
of analysis gives higher percentages than the standard method Use 
both the t test and the sign test (b) How can the type 1 error be made in 
this experiment’ Hotv can the type 2 error be made? What are the 
consequences of each error? (c) Find a 95 per cent confidence interval 
for the difference in means of the two analyses M) Discuss the power 
efficiency of the sign test relative to (he / test for the conditions given in 
(a) 

8 15. Prove Eq (8 1) in Theorem 8 1 

Hml Since u and w are independently distributed, the joint density 
function of u and w is obtained by multiplying the density function of 
u by the density function of v fben, using the relation i = u!-/wjv, 
find the joint density funciion of / and w By Ihe methods of Sect S 4 
and Exercise 5 35, the marginal density function of / is found This is 
the density function given m Eq <8 I) 

8 16 Prove Theorem 8 3 

Hint Use Stirling a approximation (sec Exercise 3 67) 

8 17, Prove Theorem 8 4 

Hint Use theorems of Gtaps 6 and 7 to reduce Eq (8 2) to the 
form ul^wlv. where u and w are difined as m Theorem 8 V 
8 18 Show that (he variance of the t distribution with v degrees of freedom 
ti vUv - 2) when v > 2 

8 19 In preparing control charts of means X,. it often happens that the true 
mean ft and variance <r' are not known In such a case ft and tr* are re* 
placed by the unbiased estimates # «= 2 't,*i/'2 "i **''1 *» ^ 

obtained from the sample means and variances, and the chart is prepared 
the usual manner 

For random samples of sue five drawn each hour for 24 consecutive 
hours the means and variances are as follows 


Table 811 


Sample 

Mean (3,) 

yeetaace (if) 

Sample 

Meant (X^) 

Varianee (if) 

1 

11 03 

0 589 

13 

11 64 

2 093 

2 

11 n 

2251 

14 

11 16 

4 017 

3 

11 10 

1 347 

IS 

1136 

1 SS6 

4 

11 16 

2231 

16 

1219 

1585 

S 

10 03 

0419 

17 

1200 

0 330 

6 

1071 

0392 

18 

)l 10 

1 623 

T 

ir JO 

0573 

19 

)l 24 1 

2 018 

8 

1093 

1 2041 

20 

II 14 

3 173 

9 

1) 17 

0927 

21 

1089 

1 322 

10 

10 63 

1239 

22 

II 23 

0766 

11 

1071 

1249 

23 

1043 

3 345 

12 

1093 

2222 

24 

1087 

2 893 
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(a) Find x and Sp. Draw the upper and lower control limits for sample 
means so that they are three standard deviations from the center line. 

(b) Plot the 24 sample means. Does the process appear to be out of 
control at any point ? 

8.20. The control limits about the mean (center line) differ with sample size. 
The mean, variance, and standard deviation for random samples in 15 
consecutive weeks are shown in the table below. Find x and Sp. Draw 
the center line and the upper and lower control limits for each sample 
and plot the points (sample means). Does the process appear to be out of 
control at any point? 


Table 8.12 


Sample 

Number 

Sample 

Size 

Mean 

X{ 

Variance 

Standard 

Deviation(Si) 

1 

11 

11.03 

0.589 

0.767 

2 

7 

11.17 

2.251 

1.500 

3 

11 

11.10 

1.347 

1.161 

4 

8 

11.16 

2.231 

1.494 

5 

5 

10.03 

0.419 

0.647 

6 

6 

10.71 

0.392 

0.626 

7 

12 

11.30 

0.873 

0.934 

8 

9 

10.93 

2.041 

1.428 

9 

10 

11.17 

0.927 

0.963 

lb 

8 

10.63 

1.239 

1.113 

11 

6 

10.71 

1.249 

1.118 

12 

12 

10.93 

2.222 

1.491 

13 

10 

11.64 

2.093 

1.446 

14 

9 

11.16 

4.017 

2.004 

15 

7 

11.36 

1.556 

1.247 


8.77. MODEL EQUATIONS AND OBSERVATIONAL EQUATIONS 

In much of statistics we are concerned with the mean and variance or 
standard deviation for both the population and sample. (This is especially 
true when the population or populations are normal.) Since extensions and 
generalizations of concepts already introduced are of considerable importance 
in later sections, we now take a closer look at assumptions and notations 
useful for this purpose. 

The reader, no doubt, is already aware of the fact that each observation 
IS considered to be the sum of two parts, namely, the mean and the deviate 
front the mean. It is the component parts of an observation which we wish 
to discuss in this section. We restrict our attention to cases where the com- 
ponents of an observation can be added. 

To start with, observe that when we say that a random variable x is 
distributed with mean p, and variance o-=, it is understood that x is the sum 


278 


SAMPUNO — THE STUDENT t DISTRtBUnON 


CHAP I 


of 11 and the deviate of x, where the deviate may be zero, positive, or nega- 
tive Letting X, denote the jth observation in a sample of n objects and 
€f Its deviate from the mean, we wnte 

= 0=1,2. ,n) (827) 

If the population is continuous, n represents a very small part of all possible 
observations Even when the population is finite and of sire A', the sample 
IS likely to represent only a small portion of the population, otherwise 
we might just as well study all observations in the population in order to 
make an exact statement about the population parameters and not take 
the chances involved in making inference statements based on a sample 
In any case, it is generally reasonable to assume that the sample mean 



IS not equal to n for most samples, and that S has a distribution of values 
with mean n, « n Further since S is not always equal to jt, and since 
4 IS seldom known we may wish to think of an observation as the sum of 
X and the deviate ct x from X That is. ife, denotes the amount at/ deviates 
from X, the sample mean, we may write 

x,^X + (8 28 ) 

where X estimates n and e, estimates t, Note that the estimator of jx is 
based on n observations but the estimator e, of tj is based on only one 
observation However, wc are not particularly interested m e, as an estimator 
or t„ but rather in using all e/$ to find an estimator of the variance or 
standard deviation We know already that 



n — I 


IS used to estimate cr’ and 



to estimate tr Equation (8 27) is called a model equation and Eq (8 28) an 
observation equation Each equation indicates that an observation is the 
sum oJ'two components— in (8 27) the components are the true mean 

ji and the true deviate or true random error or true error effect e, of the 
Jth observation, and in Eq (8 28) the components are (he estimated mean 
or sample mean x and the esitmaied deviate or residual or error 
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There is another point of considerable interest. In Eq. (8.27) is. is con- 
stant, and 6j is a random variable. Also, Xj is a random variable. However, 
in Eq. (8.28) the random variable Xj is the sum of two random variables 
jc and Cj. For a particular sample, x has only one value, but from sample 
to sample x takes other values. Tn other words, it is possible for x to take 
any one of many values before a particular set of n observations are drawn. 
Since {s is not usually known, we use Eq. (8.28) in applications; Eq. (8.27) 
serves as a model (or ideal). 

Next, we extend the above notations to two populations. Let Xi be 
distributed with mean /s, and variance a], and let X;. be distributed with 
mean pa and variance (t|. Let denote the jth observation of a sample 
of size «, drawn from the first population, and x^j the yth observation of 
a sample of size drawn from the second population. Then the model 
equations may be written as 

Xij — jSi + £ij ij —1,2, , «i) 

and 

x^j = 1^1 + enj (y = ], 2, . . . , «j) 

These two equations may be written as 

Xtj = M( + eij (i = 1, 2; / = 1, . . . , Uj) (8.29) 

where denotes the amount Xt, deviates from /i,. Further, the observation 
equation may be written as 

= ^( + Ctj (/ = 1, 2; y = 1, 2, . . . , «(). (8.30) 

where 

nt 

= ^ (»-=l,2) (8.31) 

and Ci) denotes the amount deviates from the sample mean Jc,. If k 
populations are to be considered, and x", is distributed with mean /jj and 
variance o-f (i = 1, 2, . . . , k), the last three equations, Eqs. (8.29), (8.30), 
and (8.31), remain the same, except that i = 1, 2, . . . , A: and Uj denotes 
the size of the ith sample. 

When two or more population means are compared, it is often most 
convenient to think in terms of the amount any given mean /r, deviates from 
an over-all population mean n. For example, when comparing the mean 
verbal college board scores of all freshman at two or more schools, we 
might wish to know how much the mean score of college A, say, differs 
from the mean of k schools in a given state. In another example, we may 
wish to know how much each of k like machines in factory B on the average 
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differs from a production standard in its ability to do a specific piece of work 
(More examples and a fuller treatment of this topic are given in the chapten 
on analysis of variance ) In order to give a detailed comparison of the 
concepts about to be introduced with those discussed in Sect 8 6, we consider 
the case where k = 2 

Let two populations f, and have means (x, and fi», respectively, and let 

= (8 32) 

be the over-all mean of the two populations combined Letting a, denote 
the amount the population mean /ii deviates from the mean of the two 
population means, we may write 

ft, (/=I,2) (8 33) 

We call tf< the true effect of the ith population and note that 

2a. = 0 (834) 

Thus, substituting Eq (6 33) in Eq (8 29) gives 

= M + «. + «.! (i»1.2. jal. ,n,) (835) 

which IS the mckiel equation *ntun tn terms of true effects There is en 
observation equation corresponding to Eq (8 35) Let the over-all sample 
mean X be defined by 


or 



{8 36) 



(8 36a) 


Let the sample effect of the ith popidarion. or, for short, the ith sample effect, 
be denoted by Oi and defined by 


Thus 


isy?} 


= X + («*=1.2) 

(8 38) 

Substituting Eq (8 38) ; 

in Eq (8 30) gives 
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X(j- = X + Ui + eij (i = 1, 2; j — 1, 2, . . . , nO (8.39) 

which is the observation equation in terms of the estimated effects. The over- 
all sample mean Jc estimates the over-all population mean and the ith 
sample effect estimates the true effect o:<. It should be noted that, in 
general 

1 = 1 

unless «i = « 3 . However, it is always true that 

±±a, = 0 (8.40) 

1=1 1=1 

The relations given by Eqs. (8.32) through (8.40) can be extended in an 
obvious way to include k populations, in which case i — 1,2,...,/:, and 
71, denotes the size of the /th sample. 

The deviates e„, e^, . . . , Ctn, about the mean of a random sample from 
population /*, may be used to estimate the variance erf of this population. 
The variance estimator is given by 


ni 

24 




0-=l,2) 


71 , - I 

If two populations have common variance erj = cr| = o-*, then 

Til g Tit 

24 + 2 4 2 2 4 

c2 — i=i 


+ fU 


2 «1 - 2 


(8.41) 


(8.42) 


is an estimator of Sometimes is called error variance and denoted by 
5?. Actually, sj, si, and sj are all estimators of cr^ when the populations have 
common variance, but si is preferred, since it has more degrees of freedom. 
Since, for a given sample, the e’s of Eqs. (8.30) and (8.39) are the same, 
we may think in terms of either model equation, Eq. (8.29) or Eq. (8.35) 
when establishing confidence intervals or testing hypotheses. Now we use 
an example with unequal sample sizes to compare the methods and notation 
of Sect. 8.6 with those of this section. 

Example 8.8. The following random samples are drawn from normal 
populations with common variance <r-. 

Sample from P,; 27, 18, 10, 4, 19, 30, 1 1 

Sample from P, : 33, 9, 24, 46 

(a) Test the hypothesis p, = /ij, using the methods of Sect. 8.6. (b) 
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Use ihe notation of this section to descnbe (a) (c) if, m addition to infer 
mation given above, it is known that /t, = fi, = /x =: 20 and 
c\ =z ff\ zs c* = 100 compare estimated means and effects with true means 
and effects (d) Discuss related topics 

In using a five per cent level test of the hypothesis //, n, = ji, against 
the alternative hypothesis //, ^ (it, we need the l statistic with nine 

degrees of freedom to find the symmetric two-tailed critical region defined 
by |r( > 2 262 Since Xi = 17. i, =r 28. SS, ^ 527. 55, = 726. and 
s', ~ 139 22, the calculated r statistic is 

The statistic t, falls in the noncriiical region Therefore, we fail to reject 
ff, and conclude that we do not have enough evidence to say that the 
means are different 

Observe that Eq (8 29) is the model equation when the hypothesis 
H, /X, s jx, B jx IS being tested For if ft, s ft,, then 


M, 


Further, we may wish to write H, as 


or, using Eq (8 33). as 

a, = «, = 0 (8 43) 

Thus, the hypothesis that the population means are the same is the same 
as the hypothesis that the true effects are 'ero The alternative hypothesis 
ffo Ml ^ M* may also be written as //, a, ^ er, Thus, when the alter- 
native hypothesis is Oi ¥= a- rejection of Eq (8 43) leads to the acceptance 
of the statement, "The true effects are not equal * If the alternative hy- 
pothesis IS a, < a, (or, > or,), the rejection of Eq (8 43) leads to the 
sintement, * The true effect for population I is less (greater) than the true 
effect for population 2 " The pooled error variance can be computed from 
the e's as 

+fff-r7)n + R33 - 28)*-l- +{46 - 28n 

' 7 + 4 -~1 


For (c) we write each observed value as the sum of three components 
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First, using Eq. (8.39) and the fact that x — 21, — 17 — 21 — 4, 

and ^2 = 28 - 21 = 7, we obtain the results of Table 8.13a. Next, using 
Eq. (8.35) and the knowledge that /x, = /Aj = /x = 20, we obtain the values 


Table 8.13a 

Estimated Effects for Example 8.8 


Sample 1 

Sample 2 

Xu = X '+ Oi + Cij 

Xij = X + a^ + eij 

27 = 2H- (-4) + 10 

33 = 21 -1- 7 + 5 

18 = 21 4- (-4) + 1 

9 =21 + 7 + (-19) 

10 = 21 -1- (-4) (-7) 

24 = 21 + 7 -1- (-4) 

4 = 21 + (-4) + (-13) 

46 = 21 +7 + 18 

19 = 21 + (-4) + 2 


30 = 21 + (-4) + 13 


11 = 21 + (-4) + (-6) 


2^0 = 119 = 7(21) + 7( 

-4) + 0 2 = 112 = 4(21) + 4(7) + 0 

^, = 17 = 21 + (-4) 

Jfj = 28 = 21 + 7 

i7i = -4 

flj = 7 


Table 8.13b 

True Effects for Example 8.8 

Sample 1 

Sample 2 

XiJ = fi -f iXi + «(j 

•’fy = M + “8 + 

27 = 20 + 0 + 7 

33 = 20 + 0 + 13 

18 = 20 + 0 + (-2) 

9 = 20 + 0 + (-11) 

10 = 20 + 0 -t- (-10) 

24 = 20 + 0 + 4 

4 = 20 + 0 + (-16) 

46 = 20 + 0 + 26 

19 = 20 + 0 + (-l) 


30 = 20 + 0 + 10 


11 = 20 + 0 + (-9) 



in Table 8.13b. The true over-all mean is 20, and the estimate is 21. The 
true population effects are zero and the estimated effects are —4 and 7, 
respectively. 

For our discussion of (d) we start with 


Tit 

2(-^u - ^i) = = 0 (i=l,...,/t) (8.44) 


and from Table 8.13a we observe that this relation holds for Samples 1 and 
2. However 


2 (xi) - Mi) = 2 fi; ^ 0 (i = 1, . . . , k) 

In particular, using Table 8.13b, we see that 


(8.45) 
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2e.j = 7+(-2)+ +(-9)= -21^60 

and 

13 + (-11) + 4 4- 26 == 32 9!:0 

Since e IS a random variable, the «i/s may be used to estimate the common 
population variance «r’ The estimate is 

j. _ 7' + f-2)» + + C-9)‘ + 13' + + 26» 



It can be shown that ^ has II degrees of freedom and that i* distn 
butcd as X* with 1 1 degrees of freedom 

8 12 SUM OF SQUARES /DENTmES AND RfUTED TOP/CS 

It IS informative to note that for a sample of n, values from a population 
with mean and variance c\ that 

S(».,-d.)’ = i (1=1.2. .*) (8«) 

The proof of Eq (8 46) follows 
2 (^>1 - fti )’ = 2 1(^*1 - *<) + (■*• - 

= 2 - *.)’ + 2(*. - n.) 2 (*« - i.) + 2 (^< - 

= 2 + "*<*« - It*)* 

Since 

2(*.»--*.) = o 

Equation (8 46) is called a sum of squares identity for one sample, and is an 
algebraic identity which m no way depends on any distribution assumptions 
associated with Xi, If the sample is randomly drawn we know that 



with (n, - 1) degrees of freedom is an unbiased estimator of <rj, and we 
have just indicated that 
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2 (p^ii - 

,2 — j 


with tti degrees of freedom is an unbiased estimator of a}. Further, since 
Xi is distributed with mean /ij and variance 


O’i 


rtt 


it follows that (xj — HtYll with one degree of freedom is an unbiased 
estimator of o-j/Wj. Therefore 

n,ixt - iLtf _ ? ~ (8.47) 

1 1 


with one degree of freedom is an unbiased estimator of tr?. Associated with 
Eq. (8.46) we have a degree of freedom identity given by 

= («, - 1) + I (8.48) 


The two identities, Eqs. (8.46) and (8.48), and their extensions are 
useful in statistics in many ways. For example, the three ratios 


2 ~ 



2 (Xij - 

; and 

- 1 


2 (^f - 



1 


(8.49) 


obtained from these two identities are all unbiased estimators of o-?, provided 
the sample is randomly drawn. Thus, the three ratios 


2 (^o - 


J 



2 (Xi} - Xi)- 2 (^t - /Xv)- 

5 and 

<^1 o-f 


(8.50) 


obtained from Eq. (8.46) are distributed as with degrees, iZi — 1 
degrees, and one degree of freedom, respectively, if the sample is drawn 
from a normal population. Further, if two of the sum of squares in Eq. 
(8.46) are known, the third can be computed directly by addition or sub- 
traction. 

Returning to the case of two samples drawn from like populations with 
the same mean ji and same variance <r-, we can show that 

2 TJi 

2 2 (Xii - X)- = 22 (.Xi - x)= 4- 2 2 (Xij - Xif (8.51a) 

1=1 , J ( J 

or 

2 2 (Oi + enY- = 2 2 + 2 2 4 (8.5 Ib) 
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These equations are algebraic identities, called sum of squares Identities, 
and in no way depend on any distnbution assumptions associated with 
Xti To prove Eq (8 5Jb), wntc 

2 2 (^. + fl-)' = 2 2 << + 2 2 2 + 2 2 4 

= 22«;+224 

Since 

ir the samples are randomly and independently drawn, we know that 

2 2 

n, + n, — 2 

with /Ti •«- ff| - 2 degrees of freedom is an unbiased estimator of ir' Think 
mg of the two samples as one large random sample of size «i + n, drawn 
from a population with mean it and variance o', we know that 

n, fli — I 

with n, -t-/!} - I degrees of freedom is an unbiased estimator of o' Thus, 
we suspect that 

22 («.-« 


With 1 degree of freedom is an unbiased estimator of o' If Hi = «i = « 

2 2 <-*' -•')■ = ” 2 ( 2 .-*)' 

Now, assuming and to be random means from a population of means 
with mean ti and variance a\ = v*/a, we know that 

2 (*< - ”)' 


js wib/ased sslassaiov erf ^*,{r Tkes, 

i i 


IS an unbiased estimator of cr' It can be shown that 
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2 (8.52) 

~ j 

with one degree of freedom is an unbiased estimator of cr’*. Thus, associated 
with (8.51) we have the degrees of freedom identity 

M, + Ms - 1 = (Ml + «s - 2) + 1 (8.53) 

The two identities (8.51) and (8.53) and their extensions are very im- 
portant in all analysis of variance (see Chaps. 10-13). For example, if the 
samples are randomly and independently drawn from populations with 
common mean fi and common variance the three ratios 

2 2(^0-^)^ 2 2(^u-^i)' 

o! _ I J 

“ «, + Mj - 1 ’ ' n, +n,-2 

and (8.54) 

2 2 (^1 - 

r2 i ) 

‘^m — j 

obtained from these two identities are unbiased estimators of <r'. If, in 
addition to the assumptions just mentioned, the populations are normal, 
then the three ratios ^ 

2 2 (-^'o ~ ^ 2 2 (-^o ~ -^i)' ^ gjjj 2 2 ~ ^y (8.55) 

obtained from Eq. (8.51) are distributed as with n, + /iz — 1 degrees, 

«, -f n, — 2 degrees, and one degree of freedom, respectively. 

Since s]. is computed using only e’s and 5^ is computed using only a’s, 
one would expect si and to be independent estimators of o-^ since sf is 
computed using both a’s, and e’s one would expect s} not to be independent 
of 3nd si — each is actually the case. Further, the sum of squares identity 
is useful in finding the third sum of squares in terms of the other two. 

In particular, from Example 8.8 we find, using Table 8.13a, that three 
sums of squares are 

2 2 («< + ^tj) = 22 - ^y 

i i I } 

= (27 - 21)' +...+(11- 21)' + (33 - 21)' 

+ . . . + (46 _ 21 )' 

= 1562 

2 2 a? = 2 ”i(^i - xy 

« ^ t 

9 

' = 7(-4)' + 4(7)' = 308 
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and 

2:2^j=22(jf.i--S.)* = 1254 

ActaaJIy 

2 S usually found by subtraction as 1562 — ^308 = J254 From these sum of 
squares we find the foilowing estimates of the variance 
^ _ jjii = 156 2, s’. = 308, and s* = 139 33 Since a, = a, » 0 in 
this example, sj = 143 is comparable to all these estimates However, in 
the absence of any real knowledge about fi, and /i,. only j| = 139 33 can 
be considered an unbiased estimate of <r ‘ — the other estimates can be ex- 
pected to be inflated when a, and at differ from zero 

In the remainder of this section we consider the case where all samples 
are of size n The sum of squares identities, Eqs (8 46) and (8 Sla), may 
then be written as 



I - ^i)*= 2 “Fi) (f « 1. .A) 

(856) 

and 

2 2 -■*)'= 2 2 i-'i* - ■**)* + n 2 (■** - i)* 

(8 57) 

respectively, and the degrees of freedom identities (8 48) and (8 53) as 


« = (n - 1) + 1 

(8 58) 

and 

2n - 1 == 2{n - 1) + 1 

(8 59) 


respectively Since, m this case 

5 _ _ ,g| 4- Jl, 




or 




/r(x, ~ 
2 
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Thus, the ratio of the two independent variance estimators sh and becomes 


or 


n 2 

1 

s: 


fj(x, - XiY 

S! 


■^m _ (Xi — X^^ _ (.X) .^2)^ 

n 


(8.60) 


which we recognize as the square of the t statistic used in testing the hy- 
pothesis Ml = Ms (or III - [i, = 0). That is, = s%/sl, where t is distri- 
buted as the Student t with 2n — 2 degrees of freedom. Also, the ratio of 
the two independent variance estimators 


,/2 w(xi - ^l,y 
1 


and 


n 



(1= 1,2) 


becomes 

ii(x - /x)' 

s'- _ 1 _ (Xj - M<)° _ - Mi)° (8.61) 

~ s'‘ ~ s- s\ 

n 


which we recognize as the square of the t statistic used in testing the hy- 
pothesis Mi — fio. That is, I- = s'^s^, where t is distributed as the Student/ 
t with n — 1 degrees of freedom. Thus, it is possible to use the ratio of 
two variance estimators in place of the Student t distribution for a test of 
the hypothesis fit = Mo (or Mi = Ms) against the alternative hypothesis 
Mi ^ Mo (or Ml ^ Ms)- 

Since the t statistic involves calculating a square root, we might well 
prefer using the ratio of variance estimators if we only had its sampling 
distribution. Further, when si, is computed for more than two means!, 
say k means, we cannot use the t distribution to test the hypothesi.^ 
Ml = Ms = ■ • • = Mu- However, the ratio may be used for such a\ 
test. In the next chapters we study the statistic sl/s}, which is a special 
case of the well-known F distribution. 


8.73. EXERCISES 

8.21. (a) Use the notation of Sect. 8.11 and the methods of Example 8.8 to 
discuss Exercise 8.10. (b) If, in addition to the data given in Exercise 
8.10, it is known that M-i = 90, /m„ = 80, and o-’, = a-% = a- = 900, 
compare estimated means and effects with true means and effects, (c) 
Using the information' in (a) and (b), verify Eq. (8.46) for both sample 



290 


SAMPLINO — THE mn>ENT t DlSTRlBimON 


CHAP 


A and sample B Use Eq (8 49) to find six eslimates of cr’ = 900 Which 
of ihese estimates are unbiased estimates of o-’ = 9007 (d) Verify Eq 
(8 Sla) Find j|, si,, and 4 Which of these estimates of «r* = 900 are 
unbiased’ 

8 22. (a) Use the notation of Sect 8 II to discuss Exercise 8 11 (b) If, in ad- 
dition to the data given in Exerase 8 11, it is known that /a, = 400, 
Mi == 150 a] = 1000, and o\ n 15,000, compare estimated means and 
effects with true means and effects (c) Using the information in (a) 
and (b). verify Eq (8 46) for sample A and for sample B Use Eq (8 49) 
to find three estimates of <rj and three estimates of cr| (d) Verify Eq 
(8 Sla) Find j,'. j*, and 4 Which of ihese estimates arc unbiased? 
Explain 

8 23 Three random samples arc drawn from normal populations with com- 
mon variance 100 The sample values and population means are as 
follows 


Table 8 14 


Sample 

1 Sample Values 

True Mean 

1 

$4 

42 

54 

59 

49 

50 

2 1 

60 

S9 

7* 

64 

}$ 1 

60 

3 

94 

94 

119 

96 

ICM 

100 


(a) Prepare tables like Tables $ t3a and 8 13b (b) For each sample verify 
Eqs (8 44) (8 45), and (8 46) (c) Verify Eq (8 Sib) Compute sf, 
and 4,/s} Is 4i/4 expected it to be’ Explain 
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SAMPLING 

FROM NORMAL POPULATIONS- 
F DISTRIBUTION WITH APPLICATIONS 


Properties of the F distribution are explained Ii js shown how the F 
distribution is useful tn problems involving two variances This includes 
discussions of confidence intervals, hypothesis testing power functions and 
relation of the F distribution to the normal. Student r, and chi*square 
distributions 

9J INTRODUCTION 

We have already discussed problems involving one variance (Chap 7) 
and one or two means (Chap 8) However, there are situations in which 
we need to compare two variances or more than two means, and the distri- 
butions of earlier chapters are not appropriate It i$ fortunate that the 
same sampling distribution, the Fdistnbution can be used in both cases 
The very important application to several means will be discussed m the 
chapter on analysis of variance At this time we shall treat in some detail 
problems involving two variances and give approximate tests of the hy- 
pothesis that it] = er] = = First, we consider properties of this 

very important distribution 

9 2 THE F DISTRIBUTION 

Theorem 9.1. Let the random variable x! be distributed as chi-square 
792 
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with Vi degrees of freedom and the random variable xl, which is independent 
of Xi, be distributed as a chi-square with v, degrees of freedom. Then 


F = F{vi, Vs) = 


A. 

Vl _ 

Xl 


Vs 


is distributed with density function 


(9.1) 


^^(Vi + Vj) 

■F¥r"’ 

with mean 

and variance 


F^O 


2vl(vi + Vo — 2) 

vi(vs - 2 )"(v2 - 4) 


(Vi > 4) 


(9.2) 


The gamma function r(<a:) is defined in Exercise 3.67. 

Corollary. Let crj and o-^ be variances of two normal populations. Let s? 
with Vi degrees of freedom and with v. degrees of freedom be two in- 
dependent estimators of the variances a-j and cr;, respectively. Then the ratio 
sj/s| is distributed as f with v, and vs degrees of freedom when o-f = cr| = cr~. 

The outline of the proof of Theorem 9.1 is given in the exercises. The 
proof of the corollary is immediate. For sj/o-^ and sj/a-" are distributed as 
X~ per degree of freedom with v, and v. degrees of freedom, respectively, 
and, therefore 

ii, A 


cr- V. 



is distributed as F with Vj and v, degrees of freedom. 

The statistic defined in Eq. (9. 1) is said to have the F distribution with 
V, and Vi degrees of freedom. Thus, just as with t and x". the symbol F is 
used in referring to both the “F variate” (or F statistic) and the “F distri- 
bution.” R. A. Fisher [11, 12] originally developed the exact distribution of 
z = I In {sysl), i.e., the z distribution. Snedecor [22] studied sj/sj = e--, 
a modified version of z, and published tables for its use. Snedecor called 
the distribution of the variance-ratio sf/if the F distribution in honor of 
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Fisher Other sources of tables for F as well as f and distributions are 
given by Bancroft (2) 

The density function, Eq (92), represents a two parameter family of 
distributions The graph of three members of this family is shown in Fig 
9 1 and illustrates the fact that the “shape" of the distribution changes with 



Fig 9 1 f Distribution Curves for (», V|) = (4 4) (10 4) *nd (4 25) 
Fn Points are Indicated 


the degrees of freedom LetF. = f,(i', »',)denoie that value off * f(t>i, v,) 
for which 


where a is any value in the interval 0 < a ^ 0 5 Values of E,. a percentage 
points, corresponding to selected values of a and the most useful values 
of Pi and V, are shown in Table VJI Usually, m applications, only percentage 
points on the right tail of the F distribution are required However, if a 
left-tail value F, .(v,. p,) is needed it can be found using the relation 


and the right tail value E«(i'i, v,) To prove Eq (9 5) observe that 
Fi-.(v|, vj) satisfies the equation 


■*[ ] 
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or 


2^ 

Vi 


< 


1 


^ ' Fx.ch>uvi) 


= 1 — a 


1 


Od 

J^>. 

Vi 


= a 


(9.6) 


but F^(vi, Vi) is a value such that 


P 



> Fc(Vi, Vi) 


a 


(9.7) 


When we compare Eqs. (9.6) and (9.7), it is clear that 
Vi) = F„(v„ Vi), and thus Eq. (9.5) holds. 

We know that the mean of the distribution of x^lv — S'la-- is one. 
Further, as the number of degrees of freedom approaches oo, almost all 
of the sample statistics 5^ / will be arbitrarily close to the mean, one. 
In this case we say that converges in probability to one or, symbolically, 
if k is an arbitrary small positive real number, then — 1 1 < A-) — > I 

as V CO. The mean of the distribution of 


is 




Hence, as both v^ and v, approach oo, the mean of F approaches 1, and 
almost all of the sample statistics 



will be arbitrarily close to 1 ; that is, F converges in probability to 1 . This 
means that the distance between the two values Fx-a and F^ decreases as 
both Vx and Vj increase. In other words, if v, — >■ oo and v, — > oo, then 
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/■. -► I and /■,.„ -* 1 If. in partjaiJar, a-J = <r| = er*, f may be replaced 
by j|/jJ m the above statements These statements may be checked by looking 
at the percentage points of Table VII For a fixed a and a fixed ViCv,), 
the F, values decrease as v,{v^ increases, and consequently the values 
increase For a fixed « the F, values decrease toward 1, and the F,., 
values increase toward I as t>, and vt increase 

Note Using the variance of F.^J. from Theorem 9 1, we see that 0 
as V] and i>, — • oo 

In many applications the sample variance ratio s'Js* is considered 
If crj ss ffl = tr*. then s?/s* IS distributed as F However, if <rj <rj, 

IS distributed as 


g'x! 

Vt 

Thus It fellows that ary upper er percentage point 



must satisfy the relation 




(9 9) 


If equal proportions of /?//,’ are to be in the tails of the distnbution then 
100 (1 — a) per cent are in the interval 


4 ^', 4 (9101 

<r, S| <T, 

The following 100 (1 — a) per cent confidence interval for<rJ/eJ is obtained 
by solving the inequality {9 10) with respect to of/tr} 


s? I 




I 

T, . ..) 


(9 11) 


in practice, equal percentages are usually taken in the two tails of the 
F(p,,y,) distribution In case F, and F, are values of F = F(v„vt)such 
that F(F < F,) = a, and F(F > F,) =? or*, where ai and a, are both between 
0 and 0 5, then Relation (9 1 1) may be written as 


a* 

W, 


5 ? 


(9 12) 
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The inequality (9.12) gives a 100 (1 - - tts) per cent confidence interval 


9.3. APPLICATION OF THE F DISTRIBUTION TO PROBLEMS WITH TWO 
VARIANCES 

The primary application of the F distribution occurs in analysis of vari- 
ance (Chaps. 10 through 13) where the usual hypothesis involves several 
means. At this time we consider the F distribution as it applies to exactly 
two variances; this is a brief treatment, since these problems are so similar 
to one-variance problems discussed in Chap. 7. Most of our attention in 
this section is focused on points where the treatment of the F and distri- 
butions differ. 

It is important to know the relative sizes of two population variances 
(dispersions) when one is comparing two processes, temperature in two 
locations, traffic violations during day and night, achievement in two sections 
of a class, amount of foreign matter in two lots of raw material, incomes 
in two cities, etc. One is usually interested in knowing which of two variances 
is larger. Thus, the null hypothesis is likely to be He'.er^ = with alter- 
native hypothesis say. The null hypothesis in this case might 

also be stated as o-? < <t\, since rejection of the null hypothesis leads to the 
acceptance of the same alternative hypothesis, namely, Ha‘cr{>crl. In 
some cases, the null hypothesis is = o-f with alternative hypothesis 

Test procedures, confidence intervals, and power values are 
illustrated in Example 9.1. 

Example 9.1. Let samples of sizes Oj = 16 and — 20 be drawn from 
populations one and two, respectively. The variance estimates are sf = 8.9 
and Sj = 4.6, respectively. (The reader may wish to think of populations in 
his own field of interest with these variance estimates resulting from coded 
data.) (a) Test the hypothesis //(,: erf < erf against i/a: o"? > a-f. (b) Find a 95 
per cent confidence interval for the ratio erf/erf. (c) Find the probability of 
making the type 2 error in (a) if a-f = 2erl (d) Discuss power of the test in (a). 

In order for the test in (a) to be valid, we must assume that random 
and independent samples were drawn from two normal populations. It 
should be noted that no assumption is made concerning the means and that 
the hypothesis may be stated as i/o'- o-f/o-l < 1 and Ha:<rfl(Tl> I. Under 
the extreme condition, erf /erf = I, of the null hypothesis we think of 
sf/sf = 8.9/4.6 = 1.93 as an F statistic. For a five per cent level one-sided 
test, the critical region is made up of all values of F for which 
F> 2.23 = Fo 5(15, 19). It should be observed that for a one-sided test 
erf and erf can be selected so as always to use the upper tail of the F distri- 
bution. Since 1.93 falls in the noncritical region, we fail to reject That 
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IS, we do nol have enough evidence to say that the variance of population 
one IS actually greater than the variance of population two 

To find the 95 per cent confidence interval given by (9 11), we first find, 
using Table VII, r,.(15, 19) = 2 62 and 

^■”"='”> = rj5rT3) = 2Ts 

by linear interpolation Substituting these values of /'along with sf/s| = 1 93 
in (9 11) gives 

074<4<54 

or 

0 74tfJ < <t‘ < 5 4tf ; 

Ut 

X* = ^ (9 13) 

Then the power of th« test of H, X* ^ 1 against H, X* > I i$ given by 
/’(X') = i’[-^>f.(-..-.).X'>l] 



Using linear interpolation, we find /*{/l(I5. 19) > 11 1] = 0 42, since from 
=t»'5ban<iyi?\^-3,T9r>'l Wj 'fiow- 

cver, using the chart on p 60 of Voglcr and Norton {24] we find that 
/[fflS, 19) > 1 IIJ — 0 39 Thus, on substituting m Eq (9 15) we get 


/S(X* = 2) = I - 0 39 = 0 61 
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From Eq. (9.14) it is clear that the power increases with \\ since 
Fa(yi,Vi)lX^ is a decreasing function of X*. Further, from Eq. (9.14) and 
the definition 

p(X“) = I - /3(X“) = i’[F(i'„ J'z) > V 2 )] 


it follows that 




Fa{Vu Vj) 

X' 


or 

V == v,)-FB(y„ V,) (9.16) 

For an a level test and selected values of the power function 
p(\2) = 1 — /9(X^), we obtain, from Table VII, F^ivi, Vi) and Fg{vn, Vi) 
and determine values of X^ using Eq. (9.16). By plotting the points (X°, p(X^)) 
and connecting them by a smooth curve, we may obtain the power curve 
for fixed values of a, v,, and v^. The reader is probably already aware of 
the fact that the discussion following Eq. (9.16) is analogous to the dis- 
cussion following Eq. (7.12). This being the case, we leave it to the reader 
to prepare a table like Table 7.1, to draw curves like those in Figs. 7.3 and 

7.4, and to find power functions when the alternative hypothesis is 

(T? < a-] or < 7 ] (t\. An extensive table of X- corresponding to the five per 

cent and one per cent levels of significance is given by Eisenhart [23]. A table 
which is useful in determining the minimum equal sample sizes for fixed 
values of a, /3, and X^ is given by Davies [10]. At the end of Sect. 9.4 there 
are statements concerning the importance of the normality assumption 
in testing <t\ = a-j. 

9.4. TESTS FOR THE EQUALITY OF K POPULATION VARIANCES 

In Sect. 9.2 we discussed an exact method for testing the equality of two 
normal population variances and indicated a few places in which this test 
might be required. Often it is not convenient to restrict the problem to 
two variances. Thus, we now consider tests of the equality of k normal 
population variances o-[, o-|, . . . , <rL i.e., tests of 

H,\ ct\ = <t\z= ... (9.17) 

These tests are based on variances s;, s},...,sl with j/„ v^, . . . ,v^ degrees 
of freedom, which are computed from random samples taken independently 
from normal populations with variances erj, <r|, . . . , respectively. It 
should also be noted that, in testing the null hypothesis Ho', iii = ii^ by 
the methods of Chap. 8, we needed to know whether the population variances 
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a] and <t\ are equal A preliminary test of the equality of variances described 
in Sect 9 2 could be made m order to decide whether to pool the sample 
variances before applying the appropriate t test to p, = iit There are 
similar situations (explained in Chaps 10 and 11) in which we need to know 
whether k sample variances may be pooled to obtain a single variance esti- 
mator with an increased number of degrees of freedom 

Several tests have been proposed for testing the hypothesis (9 17) For 
a fixed a we should select, if possible, that test for which the probability of 
committing a type 2 error is a minimum This is difficult, since the size of 
depends on the particular form of the alternative hypothesis, i e , the 
ways in which we think Eq (9 17) may be wrong, as well as the construction 
of the test For example, if Eq (9 17) docs not hold, it may be true that 
the variances are more or less randomly scattered, that t — 1 variances 
are equal, and the Ath variance differs from these by a large amount, that 
about half the variances are equal to and the remainder are equal to 
9-^ + k, where A. 0 is some real number 

When it IS thought that the population variances are rot equal but more 
or less randomly scattered, the test most often used is called Barden's test 
[3] and IS described m many places (1.3, 5 IS, 16] Actually, Bartlett modified 
the likelihood ratio test proposed by Neyman and Pearson [19] in construct* 
mg his test The test statistic is 


where 


t-lns’ 
Ja 


~-r T 


( 918 ) 


M= via 5* - 2 ** 

V -= S**! 

s* = and 

V 


3(fc-l) 


(919) 


If each Vi 2> 5, the chi-square distribution with k — I degrees of freedom 
serves as a safislactory approximation to the distribution of B, otherwise, 
percentage points of Sf = BC may be obtained from tables computed by 
Memngton and Thompson [18] In any case, if B is sufficiently large 
{B > — 1) Of M =: CB> value tabled by Memngton and Thompson], 

we reject Eq (9 17), the hypothesis of equal vanances 
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Note. Since C> 1, it is unnecessary to compute C when M < xli^ ~ 0- 
In Example 9.2 we illustrate Bartlett’s test for equal sample sizes. 

Example 9.2. Suppose four manufacturers, A, B, C, and £>, make the 
same gauge of copper wire. Suppose ten measurements of tensile strength 
are made on the wire produced by each manufacturer. The measurements 
in pounds, after subtracting 5000 pounds from each, are shown in Table 9.1, 
along with means, variances, and common logarithms of the variances of 
the samples. The problem is to determine if the manufacturers make wire 
of the same tensile strength, if it is assumed that all conditions of the manu- 
facturing process are the same for this gauge of copper wire. We use Bartlett’s 
test to compare the population variances. In Chap. 10 we consider the 
problem of comparing the population means. 


Table 9.1 

Tensile Strength in Pounds, minus 5000 Pounds, 
of 10 Samples of the Same Gauge of Copper Wire 


Manufacturer 

A 

B 

C 

D 


Sample 


msmmm 


x[ — 5000 


1 

110 

130 

100 

70 


2 

90 

45 

200 

40 


3 

120 

50 

90 

100 


4 

130 

40 

70 

180 


5 

115 

45 

90 

40 


6 

105 

55 

130 

150 


7 

50 

65 

80 

200 


8 

75 

120 

70 

210 


9 

85 

50 

80 

220 


10 

40 

150 

150 

250 


x'i - 5000 

92 

75 

106 

146 



5092 

5075 

5106 

5146 



895 

1171 

1760 

6093 

= 2616.25 

log 5? 

2.95182 

3.23477 

3.24551 

3.78483 

3.41768 


For the case where n, = Uj = . = w* = the B statistic becomes 

(n - l)(A:lnj^ - 2 ^"■^0 

B = (9.20) 

where 


C = 1 -p 1l.^ and s* = ^ 

^ 3/c(n - 1) ^ IT 

In particular, since In a = logio o/logu e = 2,30259 log,o e, we have 


(9.21) 
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5C = (10 - 1)(2 30259)[4{3 41768) -(2 95182 + ••• + 3 78483)] 
= 9 4040 


10463 


and, therefore 
B = B9S 

Since Xo,(3) = 7 81 and = 1 1 34, we reject the equality of variance 
hypothesis at the five per cent level, but fail to reject this hypothesis at the 
one percent level Note that for a one per cent level test we need not compute 
C, since BC = 9 40 IS already less than the significant value x„(3) = 1 1 34 
If It IS suspected that hypothesis (9 17) fails to hold because exactly one 
population variance is appreciably larger than the other k - I variances, 
then a test developed by Cochran 18. 23] when i», = • • ■ = r, = n is more 
appropriate than Bartlett's test The test statistic is 


largest of the s', 


(9 22) 


Tables ofg.fAr.v) for which Plg^g.{k.v)] — a are given in Chap 15 
of [23] Thus, m Example 9 2. if the experimenter had suspected before 
making measurements that, in case Eq (9 IT) failed, the variance of manu* 
facturer D was greater than the variances for manufacturers A. B and C. 
then Cochran’s test should be applied 
Hartley [17] proposed that 


_ largest of the 
smallest of the s} 


(9 23) 


be used to test hypothesis (9 17) Other tests sequential tests, have been 
proposed by Girshick [14], Wald [25], arid Cox [9] 

If the populations are known to be normal or almost normal all tests 
which have been mentioned involving two or more variances are appropriate 
However, if the populations are not normal these tests are very impractica], 
Since rejection of the hypothesis (9 17) could mean that the population 
variances are unequal, that the populations are not normal or both That 
IS, these tests are very sensitive to nonnormality (recall that the test of 
/I, = fi, IS fairly insensitive to nonnormality) Thus, for example one should 
not use any of the above-mentioned tests as preliminary tests of equality 
of variances before testing the equality of means whenever there is doubt 
concerning the normality of the populations Concerning this point. Box 
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[6, p. 333] writes, “To make the preliminary test on variances is rather like 
putting to sea in*a rowing boat to find out whether conditions are sufficiently 
calm for an ocean liner to leave port!” Efforts have been made, with some 
success, by Bartlett and Kendall [4] and Odeh and Olds [21] to find a test 
of hypothesis (9.17) which is not so sensitive to nonnormality. 


9.5. SPECIAL CASES OF THE F DISTRIBUTION 


In Sect. 9.2 we noted that xllvi converges in probability to 1 as 
approaches infinity. Thus 


2d 

Vi) = 


converges in probability to 

F(p„ oo) = id = (9.24) 

V\ V, 


as Vi approaches infinity. That is, the per degree of freedom distribution 
with Vi degrees of freedom is a special case of the F distribution with v, 
and oo degrees of freedom. So for any a 


Faivu oo) 


V, 


In particular, using Tables VII and V, we see that 


(9.25) 


T 0 ,( 10 , cx,) = 1.8307 = 2(!M) 

Further, as Vi approaches infinity, F converges in probability to 


It follows that 


F(oo, Vi) 


1 1^2 

W ~ xW 

Vi 


(9.26) 


Fa(oo, l/j) 


Vj 

X?-«(V2) 


(9.27) 


That is, the reciprocals of lower (left) tail values of per degree of freedom 
obtained from Table V are special cases of upper (right) tail values of F 
obtained in the extreme right-hand corner of Table VII. 

From Example 1.2 yje know that when v, = l,X^(vi)/v, = where 
II is a standard normal deviate. Thus, we may write 
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From Eq (8 5) we know that 


with vi degrees of freedom Thus, Eq (9 28) may be wntten as 
F(l.v.) = /*{v,) 

Since the relation 

F[F(1...,)<F,(1,v.)J= ] 

may be written as 

'’I- Vf.d.v.) < t(f,) < - 1 - n 

we see, knowing that the t distribution is symmetric about t » 

fK",) < => /’I'M > 


Thus, It follows that 


> t, * - 


f. »(»>») = -/F.'^l.V,) 


F.(l,i',)= /5,>(vi) 

Also, for Vi = 1 It can be shown that 


I) = 




Finally, if r, = I and v, approaches infinity, Eq (9 30) becomes 
F,(i.oo) = u;, 


(928) 

(929) 

0, that 

(9 30) 

(931) 

(9 32) 

(9 33) 


and if vi = 1 and i>, approaches mfioity, Eq (9 32) beconxcs 



SECT. 9.6. 


SAMPLING— F DISTRIBUTION WITH APPLICATIONS 


305 


1 ) = 


1 

o 

W(l+<i)/2 


(9.34) 


where and M(i+a )/2 are points of the standard normal variate such that 

P[u > Ma/o] = -y and F[m > «(!+«, / 2 ] = (9-35) 


These results are brought together in Table 9.2. It should be observed that 
F, and percentage points are special cases of F values which appear 
along the borders of the a level F table. 


Table 9.2 

Percentage Points Fa of the F(i'j, vi) Distribution 
for which P{Fa > F{yj, >' 2 )] = a 



1 

. . . »'l ■ • 

03 

1 

-9 I 


1 



W(l + a )/2 

yt 

tM 

• • Fa(yu >' 2 ) • • • 

>'2 

Xl -«(>' 2 ) 

00 

««/2 

• • • Xa (»’ l )/>'2 • • • 

I 


9.6. EXERCISES 

9.1. (a) Use Eq. (9.2) to find the density functions of F(2, 4), F(4, 2), F(4, 4), 
and F(2, 6). (b) Graph the density function of F(2, 4) and F(4, 2). (c) 
Use the definition to find the means of the random variables F(2, 4) and 
F(4,4). Check results, using Theorem 9.1. (d) Use the definition to find 
the variance of the random variable F(2, 4). Check results, using Theorem 

9.1. (e) Without tables, find Fo5(2, 4) and Fo5(4,4) and compare with 
the values in Table VII, 

9.2. Suppose two samples of 12 and 20 objects, respectively, have variances 
sf = 20 and sf = 12. (a) Is there reason to doubt that the samples are 
from populations having equal variances? Use a five per cent level test, 
(b) Find 95 per cent confidence limits for the ratio a-Va-l: for the ratio 
aila-l 

9.3. (a) Use the transformation 


(0<y<l) (9.36) 

to express the density function (9.2) as a density function /(y) in terms of 
y. Compare this new density function with Eq. (3.68) of Exercise 3.70. 
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W« call /O’) the ivia tUntay fwKUM As we mentioned before, Pearson 
has extensively tabulated percentage points of the beta distribution 
(b) Use Exercise 3 70(b) to show that the mean and variance of F are 
as given m the statement of Theorem 9 J. 


9 4 (a) Use Eq (9 1 6) and a method similar to that employed in preparing 
Fig 7 3 to draw a power curve for the one sided test of <rj = <t\ against 
e\ > d when v, = 10, »>» = 8, and a = 0 05 (b) Use this curve to find 
the probability of malcrng the type 2 error if <rf = 3<7| 

9 5 '^Since in Theorem 9 1 ^ and ^ arc independently distributed, it follows, 
when Eq (7 1) is used, that their joint density function / is given 

by /(Xi) /(xD probability clement is 






Solving for in Eq (9 1) gives 


x! 




which, when substituted in Eq (9 37) yields 


/(f.xD 






'm 


On integrating Eq (9 38) over the domain of < id **)■ 
obtain the marginal density function given by Eq (9 2) Prove Eq (9 38) 
and then Eq (9 2> 

9 6. Prove Eqs (9 27) and (9 32) 

9 7. A new method and an old method for counting bacteria m rat feces 
were to be compared Fiicm were made by both methods, and the result* 
ing slides were fixed and stained with crystal violet Twenty five random 
fields were examined with a microscope and the number of bacteria 
counted in each field for each film The results shown m Table 9 3 were 
obtained • 


Table 93 


O/d Method 

I 2 n 4 T 

S 6 0 

n 9 15 0 

U 0 13 15 

0 4 44 32 


0 16 

7 f 76 
7 29 

49 21 

to 26 


Mew Method 

"28 n S’ 

20 2J 27 

23 24 n 

21 41 27 

27 23 28 


28 

23 

42 

29 

21 


• Wallace. R H , “A Direct Method for Counting Bacteria in Feces." Journal of 

Baetertology, Vol 64 (1952), 593-594 
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(a) Test the hypothesis that the variances are equal against the alternative 

hypothesis that the variance for the new method is smaller than the 
variance for the old method, (b) Find a 90 per cent confidence interval 
for the ratio of the variance of the old method to the variance of the 
new method, (c) Find the power function for the test in (a). Use the 
power function to compare the variances. / 

9.8. (a) The distribution of z = ^\ncF, studied by R. A. Fisher, is more 
nearly normal than the F distribution. Show that 

/(z) dz = 2Ce''-' (l + dz, - co < z < oo (9.39) 

(b) Find C and show that / (z) is symmetrical if v, = v^- (c) It can be shown 
that z is approximately normally distributed with mean and variance 
given by 




and cr? = i( 


+ -) 

Wi 

ViJ 

2 ' 

V V\ 

I/j/ 


These two relations make possible a quick comparison of two variances 
when one computes 






Use this method to answer Exercise 9.7(a). 

9.9. The dispersions of tensile strength of iron, as measured by testing ma- 
chines at seven foundries were to be compared. Six bars were poured 
at each foundry under conditions as nearly alike as possible. The measure- 
ments were made in tons per square inch. The data along with the type 
of meehanite metal cast and the type of test machine are given in Table 
9.4. (a) Test the equality of variances (measures of dispersion thought 


Table 9.4* 


Foundry 

1 

2 

3 

4 

5 

6 

7 

Type Meehanite 

on 

GC 

GA 

GD 

GC 

GD 

GB 

Testing Machine 
Tensile — tons/sq in. 

wery 

Buckton 

Avery 

Buckton 

Buckton 

Buckton 

Avery 

Bar 1 

17.70 

18.40 

25.25 

17.00 

19.86 

17.55 

23.00 

2 

18.00 

19.20 

26.47 

15.75 

20.00 

17.68 

22.70 

3 

17.93 

19.84 

25.35 

18.90 

19.29 

17.80 

21.80 

4 

16.63 

19.16 

23.26 

17.50 

18.11 

17.26 

22.60 

5 

17.06 

19.04 

24.85 

20.00 

19.11 

17.43 

22.00 

6 

17.46 

19.72 

22.20 

17.70 

18.42 

17.40 

21.70 

Mean xj 

17.46 

19.23 

24.56 

17.81 

19.13 

17.52 

22.30 

Variance s- 

0.2838 

0.2686 

2.4184 

2.1984 

0.5720 

0.0390 

0.2880 

•Ti 

0.53 

0.52 

1.56 

1.48 

0.76 

0.20 

0.54 


r ^ Franklin, Statistical Analysis in Chemistry and the Chemical 

Industry. New York: John Wiley & Sons, Inc., 1954, p. 199. 
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to be appropriate), using Bartlett s test (b) Test the equality of variances, 
using either Cochran's or Hartley's test (c) Test the equality of variances, 
using Merrmgton tnG Thompson's tables {p 198 of Bennett and 
Franklin) (d) Test the equality of variances of the Avery testing machines 
(e) Test the equality of variances of the Buckton testing machines (f) 
In how many ways can seven variances it], ,<r? fail to be equal’ 
{Only the fact that two variances are not equal is important m this 
question Magnitude of difference is not to be taken into account) 
Find a similar solution for k variances 

910 (a) What is the minimum sample size n, = /ii «« n one would taVe if 
It IS required m a five per cent level one-sided test of the null hypothesis 
rf = trj that /S be not greater than 0 10 when irf = 3 erf « mxe 7 (b) What 
should the common sample size in (a) be for a two-sided test? 


911. An outline of the proof (hat Bin Eq <9 15) is approximately distributed 
as ^ ~ ' degrees of freedom is given by Anderson and Bancroft 

(I,pp 142-144] Fill in the details of ihiv proof 

It should be noted that an infinite set of umstants y„ called eumulanrs, 
are used tn pUce of moments to charactenze the distribution function 
The mmulants are defified by ihe generating function 

C(rt = logM{f) (9 41) 

where Af(0 is the generating function of f4 Thus, the Ah cumulant is 
given by 


Tf 


_ d' C(r) | 


(942) 


It can be shown that <y, s ft', and y, ^ ir* 

Further, Stirling's formula for approximating factorials includes one 
more term than the approximation given in Exercise i 67 The extended 
formula is 


r(a + l)-^5J.-<JI-'-(l + Y^) (9«) 

9.12. The sample sizes and variances for seven nonselected strains of guayule 
in the 54 ± chromosome group, according to the data taken from 
Fcderer*, Table 9 5. are as follows 


Table 95 


"i ir H9 117 115 119 1J6 116 

if 928 6 80 7 26 7 41 999 14 02 1080 


(a) Use Bartlett’s test to deirnnine if the population variances are homo- 
genous (b) Test tne homogeneity of vanances, using any other test of 
variances which you consider appropriate 

* W T Federer, ‘ Variability of CeiiamSeed.Seedling, and Young Plant Characters 
Of CuayuJe," V S Dept Agr Tec* SaU 919(19^6} 
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ANALYSIS OF VARIANCE- 
ONE-WAY CLASSIFICATION 


The analysis of variance is described as it relates to the sum of squares 
identity. It is demonstrated how several sources of variation may be isolated, 
estimated, and tested. Techniques are discussed in relation to models and 
assumptions. The role of assumptions, sums of squares identities, and 
models is emphasized throughout. The concept of a single degree of freedom 
and its relation to the sum of squares identity, hypothesis testing, and 
confidence intervals is discussed. The distribution of the range of sample 
means is introduced and used in testing hypotheses and constructing simul- 
taneous confidence intervals. ScheffeV method for constructing simultaneous 
confidence intervals is presented. Duncan’s multiple test procedure is 
explained. 

10.7. INTRODUCTION * 

We have already mentioned (Chaps. 8 and 9) that it is sometimes neces- 
sary to extend experiments to the comparison of several means. One might 
think that this can be done by testing the differences between each pair 
from among k means by using the r test repeatedly and, indeed, this is the 
case. However, there are some disadvantages to this procedure. First, it 
places the emphasis on pairwise comparisons, when actually the experimenter 
may be interested in comparisons involving p(p = 3, ... ,k) means. Second, 
it causes difficulty with the significance level when the experimenter is 
really interested in testing one or more hypotheses of the form 
/fo: Ml = Ps = • • • = /ip. Suppose, for example, that one wishes to test 
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when /) = /. = 6, say If we used the t test on pairs, it would be possible 
to make (J) = 15 tests Suppose a five per cent level / test is used for cadi 
difference In this case, the chance of saying that some one or more of the 
differences is significant might be as large as 1 — (I — 0 05)'* = 0 54 That 
IS, for five per cent level pairwise tests of the hypothesis p, = /x, = . • = p, 
the significance level might actually be as large as a = 54 per cent. This 
means that the hypothesis will be rejected as often as 54 per cent of the 
time when it is true (Tins problem is discussed m some detail in Sect 10 9 ) 
Third, there is a loss in precision in estimating the variance if only the 
measurements of the two samples being compared arc utilized 

If one is interested tn testing a single hypothesis of the form 
W# /I, = p, = = ji, the f distribution is most often used This requires 

two independent estimates of the variance <r* common to <1 normal popu- 
lations from which k random and independent samples of size 
Hi 0=1, , k)y respectively, arc drawn Thus, the test involves analyzing 

variances That is, the treatment of the data involves separating the variance 
of all observations into parts each measuring variation which is attnbuted 
to specific causes 

The method of analyse of variance, an arithmetical process for splitting 
up a total variance into its component parts, has a much wider application 
than the simple generalization already noted, and is probably the most 
powerful procedure in the field of etperimental statistics We start our 
discussion of analysis of variance with the simplest case 

10 2 AN/tlYS/S OF VARIANCE FOR ONE-WAY ClASS/FlCATJON— 
AWICAHON 

Measurements of a quantity can often be classified into categories cor- 
responding to different condmons in an experiment For example, the 
measurements resulting from 40 analyses of the concentration of iron in 
a standard solution may be classified into five categories of eight determi- 
nations corresponding to eight measurements made by each of five analysts 
The measurements, in this example, fall into five mutually exclusive cate- 
gories, where each category represents the population of all possible results 
which could be obtained by a particular analyst under constant conditions, 
and the eight specific results represent a sample from the population If 
It IS assumed lhai each analyst obtains measurements independent of the 
other analysts, the classification of observations is considered a one-way 
classification, and we say that there is a swgle variable of classification 
In the example, the variable of classification is analyst, and the variable 
has five values Since there is usually unexplained variation from one obser- 
vation to another within a given category, we call the observations values 
of a random variable There is one random variable for each category 
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The data from an experiment are usually arranged in a rectangular 
array for ease of computation. A column of numbers represents measure- 
ments for one category. Following the usual convention, the different 
categories of measurements are often referred to as treatments as well as 
columns of measurements. Thus, when we refer to different treatments, we 
may be thinking of different schools, different analysts, different fertilizers, 
different concentrations of a solution, different methods, etc. Methods of 
estimating parameters and testing hypotheses in a one-way classification 
are given in Example 10.1 along with a useful notation. Certain theoretical 
justifications follow the example. 

Example 10.1. (a) Use the coded data in Example 9.2 to find estimates 
of the means and effects for manufacturers A., B, and C. (b) Use a five 
per cent level test to determine if manufacturers A, B, and C, on the 
average, make copper wire of the same tensile strength. It is assumed 
that all conditions of the manufacturing process are as near the same as 
possible, (c) Find 90 per cent confidence limits for the population means 
(la, and fic. 

If the samples are assumed to be randomly and independently drawn, 
the estimates x^,Xu,Xc and a^,ao,ac shown in Table 10. 1 are the best 
(in the sense of being unbiased and with minimum variance) point estimates 
of the true means /Xa, tic and true effects Ua, Uu, Uc, respectively. The 


Table lO.I 

Tensile Strength of Copper Wire in Pounds 


^\t^nu/ac/urerl 

A 

B 

1 ■ c 


Sample 1 

Xi = x'l - 5000 

*2 = ATa — 5000 

*3 = *s — 5000 


1 

a:,, = 110 

Xu = 130 

ATS, = 100 


2 

a:, 2 = 90 

*22 = 45 

^36 ” 200 


3 

120 

50 

90 


4 

130 

40 

70 ' 


5 

115 

45 

90 


6 

105 

55 

130 


7 

50 

65 

80 


8 

75 

120 

70 


9 

85 

50 

80 


10 

•*1,10 = 40 j 

ATa.io = 150 

.*3,10=150 


Ti. 

11 

II 

VO 

M 

O 

II 

II 

cn 

O 

Tc = Ti. = 1060 

T.. = 2730 

Xi. 

xa = ^1. = 92 

xb — Xi. = 75 

= *3. = 106 

X = S.. = 91 

at 

OA = Ql = 1 

Ob ~ az = — 16 

ac = «3 = 15 



notation and methods are the same as those introduced in Sect. 8.1 1, except 
that i takes values A, B, and C instead of 1, 2, and 3, and 


( 10 . 1 ) 
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and 

(102) 

We call T\ the total of the observations in the ith column (or for the /th 
treatment) and T the irond total of all observations For the coded data 
the estimate of the over-all mean is 91, and the estimates of the means of 
populations A, B, and C are 92, 75, and 106, respeciively Thus, for the 
original data the estimates of the over all mean and the means of populations 
A.B.zndC arc, respectively, 5091, 5092. 5075, and 5106 In cither case, 
the estimates of the effects of manufacturers A, B, and C are 1,-16 and 
15. respectively This points up one advantage of using effects, namely, 
that adding or subtracting a constant to all the observations does not change 
the estimates of the effects Observe that 

We use the general procedure, given on p 218, to test the hypothesis 
in part (b) of the example 

1 The null hypothesis ts //» s p, = and the alternative hy- 

pothesis IS. “All the means are not equal ’* The alternative hypothesis 
for the Ftest is not tu, for might be the same as)iA, 

and both might be different from p, In general, if the null hypothesis 

fi, ss = fi, IS rejected, it does not necessarily indicate that 

Mi(*=l, , A) IS different from MjO = 1. , A, i it may 

only indicate that some linear combination of one subset of the 
ti's IS different from a linear combination of some of the remaining 
M’s That is, a significant F may indicate that (/t| + 
different from (^, + say 

2 The significance level is a s 0 05 Assume that the three samples of 
common size 10 were randomly and independently drawn from three 
normal populations with common variance a* 

3 The statistic for the test is F(2, 27) = ri,/j5, where, according to Eq 
(8 54) 



are unbiased estimators of tr* with two and 27 degrees of freedom. 
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respectively. is an unbiased ’stimator of only if is true, but 
is unbiased so long as tl>e ’-air pies are randomly and independently 
drawn. 

4. The critical region is mad. up of all those values of F for which 
F > F,o,(2, 27) = 3.35. The critical region is the upper tail of the 
F distribution, since failure of makes si on the average larger 
than o-^ 

5. From Table 10.1, = 1, fl/, = -16, and Oc = 15. Thus 


52 ^^ 10[1° + i = 10(241) = 2410 

From Table 9.1, si = 895, si = 1717 and sf. = 1760. Thus, the pooled 
or error variance is 

_ 9(895) + 9(1717) + 9(1760) _ 4372 _ , 

s. y- 


Hence, the computed F statistic F^ is 


p _ 2410 
' “ 1457.3 


1.65 


6 . Since F, = 1.65 does not fall in the critical region, we fail to reject 
Ho at the five per cent level of significance. That is, we reserve judg- 
ment about the relative strengths of wire manufacturered by A, B, 
and C until we have further evidence to the contrary. Hence, if we 
wish to take action as a consequence of the test, we act as though 
the three manufacturers make equally good wire. 

The numerical calculations for this type of problem are often summarized 
in what is known as an analysis of variance table, shown in Table 10.2. 


Table 10.2 
Analysis of Variance 


Source of 

Sum of 

Degrees of 

Mean 

Computed 

Critical 

Variation 

Squares 

Freedom 

Square 

F 

F 

Among means 

4,820 

2 

2410 

1.65 

3.35 

Within (error) 

39,2'il 

27 

1457.3 



Total 

44,167 

29 


i 



The proper conclusion can be read directly from this table, provided the 
assumptions hold. 

To find 100 (1 — a) per cent confidence limits for mean p. of a normal 
population, we need independent estimates x,, and s^ of the population 
mean and variance along with percentage points of the Student t distribution. 
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Then the symmetnc 100 (1 - a) per cent confidence limits ate given by 






(104) 


where v denotes the degrees of freedom of s and n, denotes the size of the 
sample i When a random sample » drawn from a single normal population 
with mean fi, and variance »*, we know that and are independently 
distribuled (/ = 1,2, ,k) Further, we know that si, ij, andj? are 

independent estimators of «•’, the common population variance Thus, 
rj IS an estimator of which is independent of This means that 



(l=A,B. C) 


(10 5) 


IS distributed as the Student i with degrees of freedom of rj being 27 in our 
problem Now, since r«(27) = 1 7033, the 90 per cent confidence limits 
of )i| arc given by 


^ 1 ?033r. 


(106) 


From the analysis of variance table we get ^ = 1457 3 Thus 
^(27)^2 « 1 7033vT45n = 20 6 


Therefore, the 90 per cent confidence intervals for ftj, fig, and Mr are, respec- 
tively 

50714Ib<M-<51l2 6]b 

5054 4 lb <M.< 5095 6 lb (107) 

5085 4 lb < Me < 5126 6 lb 

The confidence intervals in (107) were found by using a pooled error 
variance We could have used the individual sample variance s* with nine 
degrees of freedom in Eq (104) to obtain 

5074 6 lb <Ma< 5109 4 lb 

5051 0 lb <M*< 5099 01b (10 8) 

5081 7 lb < Mr <5130 3 lb 

since r„(9)/A/IO = (I 833) (0 3162) = 0 5796, = 29 92, Sg = 41 44. and 

Sc = 41 95 However, both sets of intervals, (10 7) and (10 8), should not 
be used in the same experiment If it is reasonable to assume the vanances 
should be pooled, then only (10 7) should be used, since the error variance 
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is more precise, having more degrees of freedom. On the other hand, if it 
is not reasonable to assume that the variances of the different populations 
are the same, then (10.8) should be used, even though there is a loss in the 
number of degrees of freedom in the variance estimator. In no case should 
an interval for one population mean be obtained by using the pooled variance 
and an interval for another population mean by using the single sample 
variance. 

It should be noted that when the samples are randomly and independent!) 
selected the point estimates of the parameters are unbiased. If, in addition 
to these assumptions, the populations are normally distributed, then con- 
fidence intervals may be obtained. Further, if these three assumptions along 
with the equality of variance assumption hold, then we may test the hy- 
pothesis Ho'. Ill = • ■ • = ^lk and obtain confidence intervals by using the 
pooled variance. All of these statements are based on the assumption that 
the yth observation in the ith sample can be written as 

Xii = fii + e,} (i= j = 1 nj (10.9) 

or 

Xt, = + oCt + Si) (i = 1, . . . , A:; y = I, . . , ,«,) (10.10) 

where 

k 

2 Ail 

Ai = and Ui = fii - fi (10.11) 

These symbols were discussed and given names in Sect. 8.11. 


70.3. ANALYSIS OF VARIANCE FOR ONE-WAY CLASSIFICATION- 
THEORY 

Before we discuss in detail the one-way classification, we review and - 
expand some of the related ideas for a single sample. In a population with 
mean ii and variance o•^ we may think of any observation x as the sum of 
the mean /x. and a deviate e. Thus, in a sample x,, Xj, . . . , of size n the 
yth observation X; may be written as 


Xf — fl + ej 

or, in case x is the sample mean, as 

U = 1, • 


(10.12) 

Xj = X + Cj 

U = 1, • 

. . , n) 

(10.13) 


where e_, is the deviate about the sample mean. If the sample is randomly 
selected, x and the sample variance, are unbiased estimators of a and o-^ 
It was shown in Sect. 8.12 that - 
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= + ( 1014 ) 


IS an algebraic identity, called a sum of squares Identity Theorem 7 3 states 
that X and s' arc independently distnbuted. provided a random sample is 
drawn from a normal population Thus, since w, jr, and o' arc constants. 
It follows that 




: (n- 1)J* 


(1015) 


arc independently distributed Further, according to Theorems 7 1 and 7 4, 
the ratios in Eq (10 IS) are distnbuted as with one degree and n — 1 
degrees of freedom Thus, because of the additive nature of %' (see Theorem 
7 2), It follows that 


— ? — ‘ — ? — — ? — 

IS distnbuted as x* with n degrees of freedom Since s'.h* and (« - 1)*'/^* 
are independently distributed as x* vvith 1 degree and n - ] degrees of free- 
dom, respectively, then, according to the corollary to Theorem 9 1 

n - I rt 


IS distributed as F with 1 degree and n — I degrees of freedom 

It IS clear that the variance estimator s' is based on deviations of the 
observations about the sample mean, whereas sj is based on the deviation 
of the sample mean from the population mean Thus, for a given sample, 
if [I is unknown and is hypothesized to be some specified value, s' is fixed, 
but aj depends on the hypothesized value of n, say If the value [Xg is used 
m place of /i m af, we obtain 


Thus 


aj, = n(x - /!,)* = b[(a: - /t) + (ft - jx,)]* 

= r>ix - nY +n(ji- ti,y + ~ tt)0x - tig) 


E(sl,) = o' + n(ii - ti,y (10 17) 


since 


£(x — n) = Q and E[n(,x — ft)’] = o' 
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That is, the expected value of is larger than er^ when /io ^ So sh tends 
to be larger than and sljs^ tends to fall in the upper tail of the distribution 
of F(l, n - 1). It is for this reason that the critical region is taken in the 
upper tail of the F distribution and a significantly large F is used to reject 
the hypothesized value jia as the true mean. 

Now we consider k populations. In the ith population with mean jXt 
and variance a-\, we may think of any observation Xi as the sum of the mean 
Pi and a deviate 6i (i = 1, 2, . . . , k). Thus, for a sample Xi,, Xjj, . . . , Xt„, 
of size «(, theyth observation Xi, in the ith population may be written as 
Eq. (10.9) or Eq. (10.10). Using Eqs. (10.1) and (10.2), we define the sample 
means and effects as follows 


where 



(10.18) 



i=i 


Thus, as in Eqs. (8.30) and (8.39), the observation equations may be written as 


— ^i. + Ci, 

11 

II 


(10.19) 

Xij = X., + fli Cfj 

(/ ly . , , , /c^ J !>•< 

. . , /li) 

(10.20) 


where eij is the deviate of the yth observation about the ith sample mean. 
It can be shown (Exercise 10.6) that the estimated means and effects defined 
in Eq. (10.18) are unbiased estimators of their corresponding parameters 
Pi, p, and oCi, provided the samples are randomly and independently selected. 
[Actually, it can be shown that these estimators are the so-called least-squares 
estimators (Chap. 14) or maximum likelihood estimators. Further, the sampl- 
ing distributions of these estimators have the smallest variances of all linear 
functions of the observations which could be used to estimate Pi, p, and «(.] 
The sum of squares identity for the ith sample may be written as 

2 (^<3 - fj-if = 2 (x,j - x,y -h ni(x, - Pi)2 (1=1,2, , . , k) (10.21) 
When a random sample is drawn, we may think of 


2(^0 


3=1 


-i^iY 
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as partitioned into two component sums of squares which lead to two 
independent unbiased estimators of o-* If the sample, m addition to being 
random, is drawn from a normal population, then these two component 
sums of squares are independently distributed as it*x* with u, — 1 degrees 
and 1 degree of freedom, respectively Thus, if k samples, in addition to 
being random, are independent, then 

2 2 (^« - !**)*• ■ 2 <■*'•1 - t >0 22 ) 

lead to 2k independent unbiased estimators of and, if the k samples are 
also drawn from normal populations, the 2k component sum of squares are 
independently distributed as «r*x* with n, — I. 1, n, - 1, 1, , n»— I. 

1 degrees of freedom, respectively By adding the k sum of squares identities 
inEq (10 21) we gel 

2 2 “22 (*■•! - A )* + 2 0® 23) 

Thus due to the additive property of %* (Theorem 7 2) it follows that 

2 2 (*■*> - )* 2 "•(*> - 

_j — and -5 p (10 24) 


are independently distributed as x* with « - i and k degrees of freedom, 
respectively Also 

±±(x,,-n.y 

— j 

IS distributed as x’ with n, degrees of freedom but is not independent of the 
distribution of either of the statistics in (10 24) Further 


2 2 i^i} - ^i)’ 2 «>(^< - t^i)' 

si = -i — ‘ T and jJ r (10 25) 


are independent unbiased estimators of «r* Since Aif/ir* and (n - k)sll<T* 
are independently distributed as x* with k and n - k degrees of freedom, 
respectively 

2ni(ii - Ml)' 

5 ? H 

n - fc 


(1026) 
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is distributed as F with k and n. — k degrees of freedom. 

The estimator si, the pooled variance, sometimes called the wUhin vari- 
ance or error variance s], is based on deviations of sample values from their 
sample means and, thus, does not depend on the population means p.,, pj, 

. . . , Pt. That is, the estimator si is independent of the population means. 
The estimator 5 ? of o-^ on the other hand, does depend on the population 
means. Thus, we may test a set of hypothesized means pm, P 02 , • • • > Pojt, i-e. 

Ffo ■ l^i ~ ~ Po2^ • • • I pt ~ Poi (10.27) 

by substituting these values in the second part of Eq. (10.25) to obtain Sq, 
and then comparing this estimator of o-^ with si, using the F(k, n. — k) 
distribution. If is true, then sf/Sp is distributed as F{k, n. — k). However, 
if some of the hypothesized values Pot are not equal to the true mean P(, 
then we can show, following the argument used in proving Eq. (10.17), that 

£( 4 ) = + 2 n,(n, - Po 0 = (10.28) 

(=1 

That is, the expected value of 4i is larger than a-- when some poj ^ P(. 
This means that 5oi tends to be larger than si and, thus, sl^/sl tends to fall 
in the upper tail of the distribution of F(k, n. — k). It is for this reason that 
the critical region is taken in the upper tail of the /’distribution and a signifi- 
cantly large F is used to reject Ho. It should be noted that the hypothesis 
(10.27) is not often tested, since its rejection leads to the unsatisfactory state- 
ment that some population mean is not as hypothesized. (This raises ques- 
tions such as, which and how many means are not as hypothesized?) Also, 
it would often be difficult to decide on specific values for po( (/ = 1 , 2 , . . . , k). 

This brings us to the problem of giving the theoretical justification for 
the test of the more useful null hypothesis 

/f„: p, = Pj = . - . = p^ = /X (10.29a) 

or 

Ho:a, = a.= ^ a^ = 0 (10.29b) 

Obviously, a test of (10.29) requires that 

K 

2 ”i(^. - 

in Eq. (10.23) be expressed in terms of deviates from p, the true over-all 
mean, and this suggests that the over-all sample mean Jc.. be introduced 
Thus, we write 

- Pi = Pi. - X..) - (p, - p)] + (x.. - p) (lO.SOa) 
or 
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X, - fi, = (a, - ff.) + (^ -n) (10 30b) 

Squaring both sides of Eq (fO 30b) and summing over all n observations 
simplifies to 

2 = g "*(«* - «<)' + « (-e - /^)* (10 31) 

Substituting Eq (10 31) in Eq (1023) gives 

2 2 (^11 ” »^>)' = 22 (jfM - )* + 2 «<(«' - «.)’ + n (i - fiy 

(10 32) 

From the partition theory of the x* disinbution it follows that the two terms 
on the ijght'hand side of Eq (10 31) arc independently distributed as tr*x’ 
with k — 1 degrees and 1 degree of freedom, respectively Further 


2 ~ ff«)* 

and j} 


n(S -ii)‘ 


(1033) 


are independent and unbiased estimators cfff* Since r| is an unbiased e$ti> 
mator of e-* and is independent of both and r), it follows that 


2 "•(«* - «>)’ 

‘ V i jt — 

IS distributed as F with /t ~ 1 and n - k degrees of freedom, and 
- ts)* 

• " ‘n h, 

IS distributed as F with 1 degree and n — k degrees of freedom 

We have seen that the estimator rj of «r* does not depend on the popu- 
lation means (effects) The estimators} of on the other hand, does depend 
on the population effects Thus, we may test the hypothesis (10 29) by sub- 
stituting these hypothesized values of aj (i e , zero) in the first part of Eq 
(10 33) to obtain 

. 2 «i(jc, ~ X y 
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and then comparing this estimator of <r® with 5p, using the FQc — \,n. k) 
distribution. If If, in (10.29) is true, then slfsl is distributed as F(k - I, 
n. - k), since and j| are independent and unbiased estimators of the same 
variance However, if some of the hypothesized values Poi (or oioi) are 
not equal to the true common mean p. (or effect zero), then we can show, 
following the argument used in proving Eq. (10.17), that 

E{si) = o-^ +. 2 + 5 § 

when 

tliCii -f- • • • + ftk^k ~ 0 

That is, the expected value of s?, is larger than cr= when some Poi =?* F (or 
some at ^ 0). It is for this reason that the critical region is taken in the 
upper tail of the F distribution and a significantly large F is used to reject 
hypothesis (10.29). 

Also, one could test the null hypothesis that the over-all mean p is some 
specified value po by comparing si with Sp, using the F(\,n. — k) distribution. 
If a sample ration sf/sj falls in the upper region (upper tail) of the F distri- 
bution, we reject p = Po and conclude that p ^ po. If the true mean is 
actually different from the hypothesized mean po, then the expected value 
of sij = n.(x.. — (lof is 

^(sij) = H- n.(p - Po)“ (10.37) 

where 

K 

n. = 2 

i 

The sum of squares identity (10.32) and associated mean squares used in 
testing the hypothesis (10.29) and p = Po is summarized in Table 10.3. 


Table 103 

Analysis of Variance for One-Way Classification 


Source of 
Variation 

Sum of Squares 

Degrees of 
Freedom 

I""' ■ — ■ 

i Mean 
Square 

Test 

Statistic 

Expected 

Mean Square 

Over-all 

mean 

- /uo)= 

1 

4 

4l4> 

+n.{p- 

Treatment 

effects 

- x..f 

i 

k-1 

4 

414 


Within 

error 

2 2 (*ii - «t.F 
‘ } 

n. — k 

4 


Total (not ’ 
corrected) 

‘ ^ 1 

n. 





Following the proof given in Sect. 8.12, we can show that 
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= + -^y 0038) 

TTius, the sum of the two terms on the right-hand side of Eq (10 38) may 
be replaced by the left-hand side of (10 38) The relation (10 38) is called 
the sum of squares identity for the null hypothesis given in (10 29), and 
Table 10 4 is a typical analysis of variance table 


Table 104 

Analysis of Variance for One-Way GassiUcaCion 


Source oF 
Varwnon 

1 Sum of Squarrt 

Degrees of 
Freedom 

Square 

Test 

Smiistie 

Expected 

Mean Square 

Amoni 

2 - J )’ 

k ~ 1 

f. 



Within 

2 2 (■*<! ' 1 

"-‘_J 

4 



Total 







JO 4 POWER Of ANAIYSIS OF VARIANCE FOR 
A ONE-WAY CUSS/f/CATION 


If the hypothesis in (1029) is false, then the expected value of sj. the 
mean square for treatments, is equal to 


r?rr2"<'>: 


Thus, sJ IS not distributed as if’x*/(* - 1). and the ratio s|/jJ is not distri- 
buted as F with k — 1 and n — k degrees of freedom In fact, in this case 
s\!s\ IS said to be distributed as a noncmtrai F [26, 31] However, if we let 





(10 39) 


It can be shown that s\ls\ is distributed approximately as X’F with v, and 
»>i = n — it degrees of freedom, where 




( 1040 ) 
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Since the power function of the test of the hypothesis in (10.29) may be 
expressed in terms of X® as 

/7(X=) = P[4 > - 1, m - k)\\^ 

it follows that 

^ P^Fiv’,, n. -k)> -M (10.41) 

also gives the power of the test approximately. Clearly, the power function 
is an increasing function of X^ since F„ik — 1, /i. — k)lX^ is a decreasing 
function of X'. 

To find the value of X^ for which p(X^) = 1 — ^(X^), we solve 
p\F(y’„ n.-k)> = 1 _ ;9(X=) 

which leads to 


X = ^c^r/’”‘ kyPgin. 


(10.42) 


Using this relation and Table Vll, we may determine a sufficient number of 
points to draw a power curve in terms of X^ 

From Eq. (10.39) it is clear that X- > 1 may be used to measure an alter- 
native hypothesis to the hypothesis (10.29b), for the more X® exceeds 1, the 
more the alternative deviates from the null hypothesis. Since X- depends on 
k, a--, /ii, . . . , H^, a,, . . . , a^, we may determine power for an a level test 
by specifying values of these parameters. U n, = n (i , k), then we 

need specify only «, k, cr% a,, . . . , Qrn_, in order to determine power for 
an a level test. Actually questions concerning power are often raised after 
one knows a, n, and k. This means that only cr- and a,, • ■ • . cx^-i (or o-- 
and 2 Q^i) must be specified in order to determine power. 

For the case where nj = h (/ = 1, . , . , /c), we may also write 


+ ( 10 . 43 ) 

where 

iar 

1=1 

j, k 

<i>- = (10.44) 

n 

Clearly, X= is an increasing function of <}>-. Thus, power of the test of the 
hypothesis in (10.29) may be expressed as a function of as in the usual 
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case Table VIII shows eight pagCY of graphs with p = 1 - /9 on the vertical 
scale corresponding to 4> o" honiontal Two levels of significance, 
a 0 01 and 0 OS, for eight values of »», and several values of f, are shown 
[The reader should note that »i is used in place of v'u since the exact non- 
central F distribution was used m the calculations in place of the approxima- 
tion V|) 1 There is a diflerent curve for each set of values a, Vi, and 
Pi These cruves may be used for purposes other than finding power, just as 
Was the case for power curves associated with the i and distributions 
Example 10.2. Suppose that five normal populations have common vari- 
ance <T* = 20 with means p, = 65, j/, = 65, Ji, = 70, (i, = 75, and Ps = 
75, respectively How many random observations n should one make on 
each of the five populations so that a 0 01 level analysis of variance test of 
the hypothesis p, = p, = p, =p, « ji, = p will have a 090 chance of 
detecting dilTercnces"’ 

The degrees of freedom for the test are i/, » 4 and Pi = 5(n — 1) Since 
fi - Ml/S ~ 70, ar, = er, = - 5, » 0. a, a, = 5, and 

f(-S)«-fi-5)* + 0’-*-5»-p5»l 
- ■- i± • — 

n 

We Wish to find n such that when a s 0 01 , Vi s 4, p = 0 90 (or ^ 0 10), 
then 4> ns j/’n approximately Using the following selected values of n 
along with Table VIU, we have 


10 

J5 

20 

25 




172 
200 
2 25 
24$ 


from Table Vllt 


2 26 
Z56 
2 41 
2 3$ 


From these ejaculations we see that the graphic values of ^ are greater than 
when n = 3, 4 and 5 and that the graphic value of ^ for n = 6 
IS less than qi = Vn This indicates that the graphic value of ^ is equal 
to -JIT for some value of n between 5 and 6 If we take five observations 
from each population and run a 001 level analysis of variance test, the 
chance of making a type 2 error is greater than 0 10 if we take six 
Observations, the chance of making a type 2 error is less than 0 10 Thus, 
in order to have adequate protection, we take samples of sue six in this 
experiment 

In an actual experiment we do not usually know cr’ and oTi (i 1, . , A) 

Thus, the above method cannot be used to determine what sample size to 
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draw. However, it is usually possible to guess values of 2 o"' so 

in determining a value of n the size of the type 2 error does not exceed the 
specified value ,5,. (One should guess 2 ^nd er® so as to underestimate 
This means that one should not underestimate cr® or overestimate 2 ct\-) 

10.5. COMPUTATIONS. RELATION BETWEEN MODEL AND DATA 

We have already seen how important the sum of squares identity, Eq. 
(10.38), is in breaking the total sum of squares into the sum of two parts and 
in preparing analysis of variance tables such as Tables 10.3 and 10.4. At 
this time we derive better computational forms for finding the total and 
component sum of squares in an analysis of variance table. The lola/ sum 
of squares, denoted by TSS, is given by 



^ Y 

A. rtf A; Tti 1 .rfW j 

rss = 2 2 - ^ . .)= = 2 2 4 - 

i 7 1 J 


or 

fC rtf rpz 

i } n. 

(10.45a) 

or 

«. 2 24-7^- 

n. , n. 

(10.45b) 

where 

LTSS = n. 2 2 4 - 7^. 

< I 

(10.46) 


denotes “large total sum of squares.” Both Eqs. (10.45a) and (10.45b) are 
very useful in computing the total sum of squares. Equation (10.45b) is 
particularly good with desk calculators; Eq. (10.45a) is the form normally 
used for reasons which will soon be apparent. 

The among means sum of squares, often called the treatment sum of squares, 
is denoted by ASS and given by 


ASS 


it 

= 2 - x..y = 2 - 

i=l i \ Hi 

i \rii n. ni nl ) 





-f- n. 


J.2 


or 
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4(f) -T 


In cascfl, = «(/ = J. ,Ar),Eq (1047) reduces to 


271 


'ST 


n„ *271 T’ 

— Hk W 

LASS = k'^Ti -r 


(10 47) 

(1047a) 

(1047b) 
(10 48) 


denotes “large among mean sum of squares ” If the sample sizes are not 
equal, then Eq (10 47) is used, otherwise, either Eq (1047a) or Eq (1047b) 
IS applied Equation (10 47a) IS the form normally used Eq (10 47b) is good 
with the desk calculator 

The Ri/Ain sample sum of squares often called error or pooled or residual 
sum of squares, IS denoted by IF55 and IS given by * 

\VSS=TSS-ASS (1049) 

For the computation of directly from the data, use 

(10 50a) 

for unequal sample sizes, or 

2n 

tFSS = 2 2 (10 50b) 

for equal sample sizes 

In expressions (1045a), (10 47) (10 47a) (10 50a) and (10 50b) note that 
the divisor m each case is the same as the number of observations making 
up the total For example, in Eq (10 47) T, is the sum of n, observations, 
and the divisor is n, T is the sum of 

" =z». 

observations, and the divisor is tt This is no doubt, the primary reason 
that these forms are normally preferred to the forms with large sum of 
squares 

In order better to understand the relation between the model and a set 
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of data, we consider an example in which samples are drawn from known 
populations. The computing forms just described are used. 

Example 10.3. Random samples of size ten were drawn from nTirmal 
populations «(2, 1), «(2.2, 1), n(2.4. 1) and n(2.6, 1), respectively. The meas- 
urements are given in Table 10.5 along with totals and sum of squares 
in columns, (a) Compare the estimated means and effects with the true 
parameters, (b) Compare the estimates of variance with cr° = 1. (c) Discuss 
some confidence interval problems. 


Table 10.5 

Random Normal Deviates 


Sample 1 

Sample 2 

Sample 3 

Sample 4 

Xi 


Xl 

Xi 

3.355 

0.273 

3.539 

3.074 

1.086 

2.155 

2.929 

3.103 

2.367 

1.725 

3.025 

2 389 

0.248 

0.949 

4.097 

4.766 

1.694 

0.458 

Z236 

2.553 

1.546 

1.455 

3.256 

3.821 

1.266 

2.289 

3.374 

1.905 

0.713 

i.m 

1.781 

2.350 

0.000 

1.800 

2.566 

1.161 

3.406 

2.407 

2.510 

2.122 

1 

Ti 

15.681 

16.184 

29.313 

27.244 

88.422 = T.. 

__i 

yixntm 

32.339668 

90.081121 

83.620342 

243.112458 = 2 2^0 

( i 


For (a), the estimated means x, are 1.56, 1.62, 2.93, and 2.72, and the 
population means /r. are 2.00, 2.20, 2.40, and 2.60, respectively. Two samples 
underestimated their population means by x, — fii equal to —0.44 and 
— 0.58, and two samples overestimated their population means by 0.53 and 
0.12 The over-all population mean is /x = 2.30 with an estimate of 
X.. = 7’../40 = 2.21 and a difference of x.. — jx = -0.09. Thus, the true 
effects and estimated effects n, are -0.3, -0.1, 0.1, 0.3 and -0.65, 
-0.59, 0.72, 0.42, respectively, with differences ai - ai of -0.35, -0.49, 

0. 62, and 0.21. Note that the sum of the effects is zero for both the true and 
estimated effects, and, therefore, the sum 2 (a, - a.) is zero. Further, note 
that the relation (10.30b) holds for each sample and that any one of the three 
differences can be found in terms of the other two. For example, in sample 

1. = (x, - (i,) _ (x.. - /X) -0.44 - (-0.09) = -0.35. 

The data in Table 10.5 along with the above discussion may be used to 
give some idea of how much the estimates can be expected to miss the true 
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means and effects for populations with variance 1 and sample sizes of 10 
and 40 As the variance gets smaller and the sample size gets larger, we 
expect the point estimates on the average to get closer to the parameters they 
estimate, as the variance gets larger and the sample size smaller, we expect 
the point estimates to be farther away from the parameters on the average 
Fora variance of J, we see from the above argument that for a sample of size 
40 the estimated mean deviates 0 9 units from the population mean, whereas 
for samples of size 10 the estimated means on the average (of the absolute 
values of the deviates) deviate 


-0 441 + 1-0 5BI + OS3 + 0 12 , 


from their population means Even though the above discussions are based 
on only 40 observations, the conclusions are fairly typical of what one 
expects when working with point estimates of parameters 

The good computational forms of this and other sections are used to find 
estimates of the common variance I For the four samples, the sUm of 
squares and variances are given by 


l<-Ti 

(10 51) 

(<=1.2, 3, 4) 

(10 52) 


respectively In particular 

SS, 37 071327 - « 12 4820 

SS, = 6 1475 
SS, = 4 1559 
55, = 9 3968 

so that 

»* « 1 3869 
4 = 06831 
4 = 0 4618 
4 = I 044! 


Thus, the four independent samples ofstze 10 give independent estimates of 
variance with nine degrees of freedom We see that the fourth estimate is 
closest to the true variance of I Another estimate of variance, the pooled 
variance sj, is give by 
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, SS,+ ---+SSi _ 12.4820 + . . . -t- 9.3968 _ 3 2.1822 

~ 1) -f • ■ • («< - 1) " 9 -t- ■ • • -1- 9 36 

= 0.894 

These five estimators are all unbiased, but the last one, Sp, is the best, since 
it has 36 degrees of freedom. It should be noted that si is not closest to 
<r^ = 1 in this particular problem. We select si because it can be expected 
to be closer to 1 on the average than any of the others. 

Further, using Eqs. (10.45a), (10.47a), and (10.49), we obtain 

TSS = 243,112458 - = 47.6512 


^ (l ^giy .t,,^ 4: , 07,2 4 ff _ (88.«2y ^ 
and 

WSS = TSS - ASS = 32.1822 

Letting 


,2_ TSS 
nk-V 


,2 ASS 
''"A: - r 


and 


.2 _ 

” kin - 1 ) 


we obtain, on substituting the above values 


(10.53) 


s 


2 _ 
C7 — 


f.2 — 


47.6512 

39 

15.4690 

3 

32.1822 

36 


= 1.222 

= 5.156 


= 0.894 


The variance estimator also found above, is an unbiased estimator of 
o-', even though the population means are different. This is not the case with 
si and j|. In fact, according to Eq. (10.36), si is an unbiased estimator of 

= 1 + 10[(-0.3)^ -t- (-0.1)^ -f (0.1)^ + (0.3V^ _ 5 
k-1 3 — -y— 1.67 

Also, using Eqs. (10.49) and (10.53), we can write 

(nk - 1)4 = (nk - k)sl + (k- I)s| 

so that 

T[(nk ~ 1)4] = £[(„k - k)sl + ik- 1)4] 
or 
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or 



_ 1 
nk — 1 


[(«*-*)E(4) + (fc- !)£«)] 

[(nl - tK + {k- !)(,> + 




(10 54) 


That IS, IS an unbiased estimator of 

. . nStil , , IM(-0 3)> + (-0 H' + (0 11' + (0 3n 
„ +_^^_ 1 + 

= 4J=105 

For our samples the estimates sj = 5 156 and jJ = 1 222 are greater than 
what we expect on the average It is not particularly surprising that 
deviates less from £(rj) ss 1 05 than s\ docs from £(ji) = 1 67 This is 
expected, since the number of degrees of freedom, 39, associated with sj is 
much greater than the number of degrees of feetdom 3, associated with /! 

Actually, we expect s] to be larger than 5 156 only about 2 5 per cent 
of the time, since, according to the x* distribution, we expect values larger 
than 

(k-l)sl « 154690 ^5 25 


only 2 5 per cent of the time m repeated sampling Similarly, we find that 
would be larger than I 222 about 2S per cent of the time and would be 
smaller than 0 894 about 25 per cent of the time m repeated sampling 

For the moment, let us pretend that we do not know that the population 
means are different, and let us test by analysis of variance the hypothesis 
that the means are equal, that IS. //« fi, ^ fit = /i, = Under the assump- 
tion of the null hypothesis j|andsJ are independent and unbiased estimators 
of the same variance <r’ = 1 Thus their ratio s^/sj is distributed as F with 
3 and 36 degrees of freedom For a five per cent level test Fm( 3, 36) = 2 88, 
and the critical region is made up of all those values of F for which F S 2 88 
For our particular samples, F = 5 77 falls m the critical region Thus we 
reject Ha and conclude that the population means are not all equal In 
making this conclusion, we normally take a five per cent chance of making 
'ftietype 3 error 'ouT since we Vnow that fee means are reafiy diHerent, we 
know that we do not make a type 1 error Further, since we reject the null 
hypothesis, we do not have an opportunity to make the type 2 error 
Confidence intervals for means and effects may be found by the methods 
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of Example 10.1. From Table VI and the above discussion, we find 
that f, 025(36) = 2.03 and 4 = 0.894, x,. == 1.57, S-i. — 1.62, Xj. = 2.93, 
= 2.72. Thus 

^. 5 ( 36 ) = 2 . 037 ^- 0.61 

so that the 95 per cent confidence intervals for ju-i, p.,, {I 3 , and fit are 

0.96 <p, <2.18 
1.01 < P 2 < 2.23 
2.32 < P3 < 3.54 
2A1< fit <3.33 

In each case, note that the true mean actually , does fall in the 
indicated interval. 

70.6. EXERCISES 

10.1. For what purposes may we properly use the analysis of variance? 

10.2. Suppose we have the analysis of variance given in Table 10.6. (a) Write 
the appropriate model equation, (b) What null hypothesis was the ex- 
periment probably designed to test? Give statements for both the null 


Table 10.6 


Source of 
Variation 

Sum of 
Squares 

Degrees of 
Freedom 

Mean 

Squares 

Expected 
Mean Squares 

Among means 

639 

3 

213 

<r= + 2 2 4 

Within 

600 

20 

30 

0-2 

Total 

1239 

23 




and alternative hypotheses in terms of the symbols used in (a) and in 
words, (c) Use a five per cent level test of the null hypothesis in (b). 
Under what assumptions is this test valid? (d) What are the unbiased 
estimates of o-- + 2 2 and '^a]l What is the largest value any 
estimator at can have in this experiment ? (e) Find a 90 per cent confidence 
interval for cr-. 

10.3. An experiment was run to determine if five specific firing temperatures 
affect the density of bricks. The mean square for “among firing temper- 
atures” was 1.23. Twelve bricks were used for each firing temperature, 
and the pooled variance “among bricks within firing temperature” 
was 0.64. (a) Make an analysis of variance table, showing all the things 
given in the analysis of variance table in Exercise 10.2. (b) Give state- 
ments for both the null and alternative hypotheses in terms of the experi- 
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ment Write the appropriate model equation and state both hypothe 
in symbols Test the null hypotliesB for « » 0 05 (c) Find a 95 per ct 
confidence interval for the error variance 
10 4 The following hypothetical data are for a completely randonuz 
experiment {the student can supply his own intetpretation) 


Sample | 

Sample 2 

Sample 3 

51 

Si 

69 

47 

35 

59 

49 

43 

57 

63 

56 

51 

44 

60 

55 

51 

39 

72 

49 

49 

41 

43 

60 

54 


(a) Prepare an analysis of variance table similar to Table 10 4 I 
the computing formulas of Sect 105 (b) Test the hypothesis that i 
three population means arc equal, showing alt steps m the general i 
procedure fc) Find point estimates of the over-alJ mean, the th 
population means and effects, and the variance common to the th 
populations (d) Find 95 per cent confidence intervals for the th 
population mean, and for the difference m the means of the fust a 
third population (e) Express each of the 24 observations as the sum 
three estimated component parts (see illustration m Table S 13a) s 
use these parts to compute the three sums of squares found m (a) 
10.5 The following hypothetical data are for a completely randonu 
experiment (the student can supply his own interpretation) 


Sample I Sample 2 Sample 3 Sample 4 
53 53 48 62 

« 43 59 15 

51 56 61 59 

45 59 52 50 

64 56 51 55 

66 54 57 

55 61 84 

68 53 

42 


42 


Follow the instructions given in Exercise 10 4 (a), (b), (c), (d), and 
10 6 Prove that the statistics defined in £q (10 18) are unbiased estimati 
of their corresponding parameters 
107. Prove Eq (10 28) 

10^ Prove Eq (10 36) 
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10.9. Prove Eq. (10.37). 

10.10. In Exercise 10.4 Samples 1 and 2 were randomly drawn from a normal 
population with mean 50 and variance 100, and Sample 3 was randoiriy 
drawn from a normal population with mean 60 and variance 100. 
(a) Find a., and a,- these numerical values be denoted by 
and respectively. Use this infomation and he data 
in Exercise 10.4 to verify the identity (10.32). That is, find the four 
sums of squares in Eq. (10.32). (b) Use the values found in (a) to test, 
at the five per cent level, the following two null hypotheses. 
H,,: a, - ttoi a = 1. 2, 3) and H,,-. (c) Prepare an analysis 

of variance table similar to Table 10.3 and test, at the five per cent 
level, the null hypothesis given by (10.29). (d) Express each of the 24 
observations in Exercise 10.4 as the sum of three true component parts 
(see illustration in Table 8.13b) and compute an estimate of the variance 
100, using the ci,. (e) Find the power of the test of the null hypothesis 

* in (10.29) when the ct effects are those found in (a). Find the power 
when tti = CC 2 = —^ ^nd = 10. 

10.11. Find power of the test in Exercise 10.6(c) when = and cr- = 40. 

10.12. In Exercise 1 0.5 Samples 1 , 2, 3, and 4 were randomly drawn from normal 
populations with means 50, 60, 55, and 58, respectively, and common 
variance 100, Follow the instructions given in Exercise 10.10(a), (b), 
(c), (d), and (e), making necessary adjustments. 

10.13. Prove Eq. (10.38). 


10.14. Prove Eq. (10.40). 

Hint. You may wish to check a reference. 

10.15. Suppose four normal populations have common variance cr^ = 50 

with means fii = 50, = 50, fij — 60, and fit — 80, How many 

random observations n should one make on each of the populations so 
that a 0.05 level analysis of variance test of the hypothesis 

= yu., = = fii will have a 0.90 chance of detecting differences? 

10.16. In a completely randomized design with five random observations for 
each of seven treatments, we have the following coded data (the treat- 
ments might be such things as different temperatures, different velocities, 
different concentrations, different tensile strengths, different number 
of hours of exposure, different strains, and different methods of in- 
struction) 


Treatments 


1 

2 

3 

4 

5 

6 

7 

162 

126 

189 

138 

207 

153 

138 

143 

114 

213 

156 

153 

147 

81 

137 

168 

120 

117 

138 

186 

141 

113 

144 

189 

126 

147 

144 

114 

181 

126 

111 

162 

189 

141 

159 
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Follow the instruciioni given in Exercise 10 3(a), (b), (c) (d). and (e) 
making necessary adjusimeats 

10 7 SINGLE DEGREE OF FREEDOM 

When the nuJ) hypothesis ft, ~ ft, - = /i* = is rejected, we con- 

clude that there are differences among the means, but the particular nature 
of the differences is not specified by the F-test procedure described m Sect 
102 However, it is possbile to test specific hypotheses involving linear 
combtnationi of any p {p *= 2, . *:) of the k means Pi, . Pt For ex- 

ample in a set of ten means we may wish to compare the mean of one 
population, say (he control, with the average of two other population 
means, say the two newest That is, we may wish to test a hypothesis of 
the type 

or 

24i-P. -M t* 0 (1055) 

which involves a linear combination of means Actually, such a linear combi- 
nation IS called a contrasi or comparison, since the sum of the coefficients 
IS zero, that is, 2 + (- 1) -f- (-1) s o In general, any linear combination 
of k population means of the form 

7 » -N m,n, + •+ mxPt 

IS called a conrriwf or comparison of true means, provided 

( rn, -t- m, -f + m, = 0 and 
some m, (i = 1, ,k) is different from zero 

We know from Chap 6 that the linear combination 
c = + m,x, + -f mtSt 

of sample means is normally distributed with mean 

ji, = 7 = m,n, + m,fi, + + mtUt 

and variance 

+ rrioi,=ml^ -h + 

r>t rit 

provided X„ are means of samples of sizes n,, ,n* which are 

randomly and independently drawn from normal populations with 
means/!,, .ji, and variances oj ,«rj, respectively In particular, the 


(1056) 

(10 57) 

(10 58) 

(1059) 
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theorem holds when the linear combination (10.58) ii a contrast of sample 
means. Further, if all the variances are equal to o-% then Eq. (10.59) becomes 


fr^ 


+ 



(10.60) 


It is clear that the statistic 


c -y 

O-e 


(10.61) 


is normally distributed with mean zero and variance one. Further, when all 
the- population variances are equal to and sj, is an unbiased estimator 
of cr- so that 


it follows that 


is distributed as t with 



c - 7 


(10.62) 

(10.63) 


or 


(2 


degrees of freedom 


(c ~ y)° 

si 


is distributed as F with 1 and 


(10.64) 


2 ) degrees 

i-l / 


of freedom 


Thus, in testing the null hypothesis, (10.56), that a linear combination, 
usually a contrast, of population means is zero, we use the statistic in 
Eq. (10.64). Under the assumption that (10.56) holds, Eq. (10.64) reduces to 


(2 ^ ^ 
•''2(f)""' 

where 

Q- = ('”'-^1 + ■ • • + nhxCjr _ ( ^ lUj .v.)= 

«1 th ^\iu/ 

Since .y, = 7) /«, (/ = 1, . . . , A-). we may write Eq. (10.66) as 


(10.65) 


( 10 . 66 ) 
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. +as2i)’ 
«» ) . 


+ • 
R| 

n» 

2(t') 

= W, = B 



(B1,r, 4 • 

• 4 m»7* )* 


n(mj 4 • 

4-Brf) 



Since 074 ** distributed as F with I and (2 " *) degrees of freedom, 

and since jJ is an unbiased estimator of<r* with (2 “ *) degrees of freedom, 
we know that Q* and sj are independently distributed and 0* is a variance 
estimator with one degree of freedom Thus, Q* is called a component of 
treatment sum of squares «rr/i an mdniduai degree of freedom or, for short 
a component Mrith an individual degree of freedom (It can be shown that 
0* IS a part of the treatment sum of squares ) 

Sometimes, when making inferences aboui k means, we are interested 
tn two or more linear combinations, usually contrasts, in which case we need 
to compute the corresponding 0*’s. each with one degree of freedom To 
illustrate what has already been developed in this section and to introduce 
the concept of orthogonal comparisons, we consider the following example 
Example 10.4. Assume that, in addition to the information in Example 
10 1. we know that the manufacturer B usually supplies buyer X with wire, 
and that manufacturet C is a new competitor on the copper wire market 
If buyer X questions that he should continue to buy wire from B, he is likely 
to want to know the answers to the following iwo questions Does B make 
wire with tensile strength the same as that of 4 and C? Is the tensile strength 
of the Wire made by A the same as that made by C’ To answer these ques- 
tions, buyer X tests the hypotheses 


= (1, 


or, the equivalent hypotheses 

Hu — 2pg + (1^ = 0 and //,, fi, — = 0 (1069) 

From the solution following Example 10 1, we have jJ = 1457 with 27 
degrees of freedom, = 920, T* « 750. T, = 1060, and « = 10 Thus, 
the component sum of squares with an individual degree of freedom for 
testing the hypotheses Htx and Hn arc 


(920 - 2(750) + 106Ur 
■l0llU(-2)‘+l«] 


« 3840 
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and 


[920 -h 0(750) - 1060]^ 
10[F -f 0’ -K-1)^] 


= 980 


respectively. Foi a five per cent level .test FosCU 27) = 4.21, so the critical 
region is' made up of all F for which f’> 4.21. For hypotheses i/oi and 
//oj we have 




— 3840 
TTT7 


2.64 and F, = 


0.67 


respectively. Since neither value falls in the critical region, we fail to reject 
both hypotheses. That is, we conclude that the tensile strength of the wire 
made by B is not significantly different from the tensile strength of the wire 
made by A and C on the average, and that the new competitor C does not 
produce wire that is stronger or weaker than that made by A. 

The reader should note that it is possible to reject either or iTos 
without rejecting = iij, = fic, and that the rejection of lij = f^s = Fc 
does not necessarily imply the rejection of either i/,, or Hos, it being assumed 
that the same significance level a is used for each of the three hypotheses. 
However, if the hypothesis Hq: = t^r is correct, then ff^i and Hoi 

are correct, and conversely. This apparent inconsistency will be discussed 
in Sect. 10.9. At this time we wish to explain and illustrate what is meant 
by orthogonal comparisons. 

In Example 10.4, the reader might have noticed that Q] + Ql = TSS, 
the treatment sum of squares. This did not just happen. It is always true, 
provided the two contrasts are such that the sum of the products of cor- 
responding multipliers is equal to zero. For example, in our problem the 
multipliers are 1, -2, 1 and 1, 0, —1 with 1-1 + (— 2)-0 + 1 (—1) = 0. 
Whenever this property holds, we say that the contrasts are orthogonal. 

In general, two or more linear combinations are said to be orthogonal 
when 


(a) Every combination is a contrast; that is, the sum 
of the multipliers for each combination is zero. 

' (b) The sum of products of the corresponding multi- (10.70) 
pliers of every two different linear combinations is 
equal to zero. 


Thus, if 

I Cl = niiiTi + niiiTi -f and 

I Cs — nijiTi tMisTj -t- tn^jT} 

are two linear combinations of treatment totals such that 


(10.71) 
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i FNii + »»i* + — 0 

iitjj -f- Wft + Wji — 0 (i072) 

m|,m,i + nt„mtt + m„ni}i = 0 

then Cl and Cj are said to be orthogonal or C, and Ct are orthogonal contrasts 
Also, Ci and C, are called independent contrasts due to the third restriction 
of (10 72) Since the set of three equations in (10 72) involves sia unknowns, 
we would expect that values can assigned arbitrarily to certain m's, and 
that this would lead to many different pairs of sets of multipliers satisfying 
(10 72) For example, suppose m,, = 1. m,t = 2, and m,, = -3 Then the 
system (10 72) reduces to the system 

r»f, + m,} + m»i = 0 
mji + 2m,, — 3m„ = 0 

which has the solutions 


m,i = —ik, m,, = Ak, m», = k 


where k is any real number not equal to zero When we use Eq (1068), it 
IS clear that Q* corresponding to the set of multipliers —Sk, Ak, k does not 
change with k For the factors out and cancels, leaving 


Ql = 


(-5T. + 47, + 7,)’ 
IOI(-5)’ + 4*^ l>j 


It IS for this reason that we usually (when possible) choose k so that each 
multiplier m a set is an integer Thus, for three means wc see that when one 
set of multipliers is selected then the second set of multipliers is uniquely 
determined except for a constant factor and the corresponding components 
Q] and Q] are uniquely determined Using Example 10 4 and the sets of 
multipliers 1, 2 —3 and —5 4, I, we find 


^ [920 + 2(750) - 3(1060)1’ 28880 

foil “2’ “(-3)’) 

and 

/M _ [-5(920) + 4(750) + 10601* 4860 

m[(-‘5)* + 4»-|- 1‘) 7 - - 694 

so that Q] + Ql = = 4820 = TSS 

Since there are infinitely many ways in which the first of two sets of 
multipliers can be selected One might ask how a particular set should be 
determined The very emphatic answer is that the /n,’s should be chosen 
before the experimenter looks at the data, and they should normally result 
from something in the theory underlying the experiment However, if the 
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data does suggest a given set of multipliers, then a second experiment could 
be performed in order to test the specific hypothesis involving these means. 
Actually, in an experiment involving three means, say, the investigator may 
not know in advance which two orthogonal comparisons to examine, or he 
may wish to test more than two comparisons using a single set of data. In 
either case, F or t should not be used as the test statistic. Fortunately, there 
does exist a test procedure for this kind of problem, and it is described in 
Sect. 10.8. 

From the above discussion it should be clear that the treatment sum of 
squares for three means can be partitioned in many ways into two component 
sums of squares, each with a single degree of freedom, provided the asso- 
ciated linear combinations are orthogonal. In general, it can be shown that 
the treatment sum of squares for k means can be partitioned in many ways 
into k — I components, each with a single degree of freedom, provided the 
associated linear combinations are mutually orthogonal. That is, if 

(/= 1,...,A:- 1) (10.73) 

1=1 

and 

Q1 = (10.74) 

i 

then 

= 0 -f- eiH- - . . 0., (10.75) 

provided Cj, Cj, . . . , Q are mutually orthogonal contrasts of k treatment 

totals (each being the sum of ti random observations). Even though or- 
thogonal sets of multipliers m,,, nii ^, . . . , m,K may be selected in many ways, 
it is the responsibility of the investigator to choose only those sets which 
have meaning in the given experimental situation. 

It should be noted that once k — 1 orthogonal sets of multipliers have 
been selected, it is impossible to select another set which is orthogonal to 
any of the first k — \ sets. Actually, once k — 2 orthogonal sets are selected, 
the (k — l)st set is uniquely determined except for a constant multiplier. 
For example, when k = 3 and one set of multipliers w,,, m, 2 , is selected, 
a second set w,,, nu^, mu is uniquely determined except for a constant 
multiplier, and a third set w,,, mjj, /tijs orthogonal to each of the first two 
does not exist. 

The statistic (c - 7)/Sc of Eq. (10.63) may be used to find a 100 (1 — a) 
per cent confidence interval for any linear combination 

y = m,fi, + ■•■ + 

The limits are given by 
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where c = ntiX, + • + 

Example 10 5 To find the 95 per cent symmetric confidence limits for 

y = HA-2tiB + {if 

using the information of Examples 10 1 and 10 4 we have 
c = 92 - 2(75) -f 106 = 48. J} - 1457. 



and r<„(27) = 2 05 Thus, the confidence limits are 
48 ± 205V1457^A) = 48 s 607 
and the confidence interval is 

-127^^^ - 2m* + i1c^ 1087 

;os i/Atwr.4Nfow coN^ioeNce fmfkVAis 

We have seen how the t and F distributions may be used to establish 
confidence intervals and (o test hypotheses involving several means lit 
particular, we have seen how the r distribution may be used to construct 
confidence intervals for specific linear combinations (usually, linear con- 
trasts) among several means We learned that the number of meaningful 
orthogonal contrasts is limited to the number of means minus one (that is, 
A: -- 1) There are at least two reasons why an experimenter might wish to 
remove this restriction He might want to examine linear contrasts which are 
not orthogonal or to examine more than k - 1 such comparisons Thus, 
It IS only natural that he ask if it is possible to give a method for constructing 
simultaneous confidence intervals for all possible linear contrasts among k 
means The answer is in the affirmative In fact we describe two such 
methods One method depends on the distribution of the “studentired" 
range (34] and the other, due to Scheffe [29] requires the use of the /‘dis- 
tribution We also describe a method due to Dunn [7] for finding m simul- 
taneous confidence intervals Roy and Bose (28] have also discussed the 
problem of simultaneous confidence intervals Before illustrating these 
methods, we describe the sampling distribution of the range of k means 

10 d f Disiribulion ef fh« kong^ 

It should be evident that the range can be used to measure the dispersion 
of k sample means and that Ar 1 means are more dispensed on the average 
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than k or fewer means. Thus, we expect the distribution of ranges to depend 
on the number of means as well as the sample sizes. We describe the sampling 
distribution of the range in terms of a random sample of size n and then 
extend the ideas to a set of k independent sample means. 

Let jc„ Xj, . . . , denote a random sample from a population with mean 
/I, variance density function f(x), and distribution function F(x). If x,,,. 
X( 2 ), . . . , ^(n) denote the same' values in increasing order of magnitude, 
then the sample range w is defined by 

W = X(„, - X(,) 

It can be shown that the density function /„(w) of w is given by 


f„{w) = «(« - 1) r” [F(x w) - F(x)r-‘f(x)f(x + w) dx (10.77) 

•/ ..on 

If/(x) is a normal density function, then the standardized random variable 
W may be written as 




F) - (X,iy-ll) 


(10.78) 


where and are standardized normal variables. The density function 
fn{W) and distribution function F„(H') of the random variable may be 
obtained from Eq. (10.77). Appropriate percentage points of both fn{W) 
and Fn{W) are tabulated in Ref. [27]. 

Since in most practical problems <r is unknown, we require the distri- 
bution of the so-called “studentized” range q defined hy q — wjs. It is 
understood that iv and s are independent variates computed from the same 
normal population. In this case we can write 



(10.79) 


since w is distributed as a- fV and s aso-Vx^. Thus, the distribution of q 
depends on the sample size n from which w is determined and v depends on 
the number of degrees of freedom of s. Table IX gives values, q^, which are 
exceeded with probability oc. = 0.05; 0.01. That is, q^ is that value of q for 
which P[q > q„] = a. We sometimes write q(n, v) in place of q. 

Let x^, . . . ,x^ denote the means of k independent random samples, 
each of size n. If X(i,, . . . , x^^, denote the same values arranged in increas- 
ing order of magnitude, then the statistic 





(10.80) 
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IS distributed as the siudentized range q(k, v), where A is the number of 
means and v is the number of degrees of freedom of s*, the independent 
variance estimator of <r* The readci should note that for A: = 2 we have 
/(v) = q{2, v) 

Example 10.6 Use the siudentized range and the information of 
Example 10 1 to test at the five per cent level the hypothesis = /i* = iir. 
Since = 92, x„ = 75, x, = 106, n = 10, and s* = 1457, we have 



For A: = 3, K M 27, and a =s 005. wc find, using Table IX, that 
9»(3.27) = 3S1 

Therefore, the critical region is made up of those values of q for which 
9 > 3 51 Since the computed studenlizcd range falls m the noncritical 
region we fail to reject the null hypothesis just as in Example 10 I 

The siudentized range may also be used lo test simultaneously hypotheses 
or to construct simultaneously conOdence intervals of linear comparisons 
of k means The following theorem is useful for these purposes 

Theorem 10.1. Let f, (i =- 1. 2, k) be the mean of a random sample 
of size n drai\n from a normal population uiih mean pt and variance ff’ Let 

c - 2 

be any comparison of k independent sample means and 

Let 5* Hith V degrees of freedom be any unbiased estimator of a* which is 
independent of the sample means Then I — a is the probability that all 
comparisons c simultaneously satisfy 

2 l"*»l < ^ - T <2^^ • (1081) 

where q, is the upper cc per cent siudentized range value found in TablelX 

The inequality in (10 81) may be written m other ways Perhaps the 
simplest IS the case where wc make the sum of all the positive m’s equal to 
one Then, since c is a contrast, the sum of all negative /n's is equal to 
minus one, and (10 81) reduces to 
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- < 2 - 2 ('0.82J 

The reader should satisfy himself that inequality (10.82) is not a restricted 
form of (10.81). It should be noted that it is possible to make Theorem 10. 1 
more general by defining c as a linear combination or by allowing the 
population variances to be different or by assuming a nonzero covariance 
between Jci and Xi- (i /') when c is a constant. 


W. 8. 2. Examples of Simultaneous Confidence Intervals 

We now use the data of Example 10.1 to illustrate and compare the 
studentized range and F procedures for finding simultaneous confidence 
intervals. We also compare the lengths of intervals of special comparisons 
obtained by these procedures with those obtained by the t distribution and 
illustrated in Sect. 10.7. 

Example 10.7. From Example 10. 1 we have x, = 92, Xj = 75, Xj = 106, 
n = 10, and = 1457. Use (10.82) to find simultaneously 95 per cent con- 
fidence intervals for an indefinite number of linear contrasts of and ^ 3 . 

Since = 3.51 when A- = 3 and v = 27, we find 


ImL 

V n 


3.51 


V- 


1457 

10 


42.5 


Table 10.7 

Confidence Limits of Contrasts of Three Means — Range Method 


Contrasts 

Multipliers 


Value of 

95 Per Ce/// 
Confidence Limits 


m. 

m 2 mz 

m,x, + nijX) + m^Xz 

Lower 

Upper 

( 1 ) 


( 2 ) 


(3) 

(4) 

(5) 

/*! - A 2 

1 

-1 

0 

17 

-24.5 

59.5 

- Ps 

1 

0 

-1 

-14 

-56.5 

28.5 

7 /<3 

0 

1 

-1 

-31 

-73.5 

11.5 

(^1 + /‘ 2)/2 — 

1 

■2 

1 

■2 

-I 

-22.5 

-65.0 

20.0 

(f*! + I'sl/Z — Hi 

1 

■2 

-1 

T 

24 

00 

T 

t. 

66.5 

(f‘2 + M3)/2 — ft, 

-1 

1 

'2 

1 

T 

-11.5 

- 54.0 

31.0 

>• 1/3 4- 2/12/3 — /I 3 

1 

2 

• 3 - 

-1 

-25.3 

-67.8 

17.2 

Mi/3 + 2 / 13/3 — /ij 

1 

15 

-1 

2 

26.3 

-16.2 

68.8 

t‘2/3 + 2/1, /3 - /13 

2 

T 

1 

-1 

-19.7 

-62.2 

22.8 

M 2/3 + 2 / 13/3 — ft, 

-1 

1 

2 

■5 

3.7 

-38.8 

46.2 

t‘sl3 + 2/i,/3 — /12 

2 

-1 

1 

*5 

21.7 

- 20.8 

64.2 

M 3/3 + 2/ij/3 — ft, 

-1 

2 

"S' 

1 

15 

- 6.7 

-49.2 

35.8 

etc. 
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Then, according to Theorem 10 I, we are 95 per cent confident of being 
correct in all statements of the form 

-42 5 < 2 2 <“*2 5 

or 

2/n.j?,-42S<2"'./^.<2w.^. + '»25 (1083) 

It being understood that the sum of the postitive m’s is equal to one 

Typical contrasts among three population means, along with multipliers, 
value of contrasts of sample means and confidence limits, are showp in 
Table 10 7 We are at leasl 95 per cent confident that the contrasts listed in 
column (1) of Table 10 7 have values between the limits shown in columns 
(4) and (5) Further, we are 95 per cent confident that all the contrasts of 
Table 10 7 along with as many others as we wish to write have values falling 
between the limits obtained by substituting in inequality (10 83) 

It should be noted at this point that, when a ICO () — a) per cent 
confidence interval is required l^or a specific contrast, the method used in 
Example 10 5 gives a shorter confidence interval than the method just de- 
scribed For example, applying the /dmnbution and Example 10 5. we obtain 
for the contrast - fin * itu the interval 

- 6 35 £ i + fx,.) - M* ^ 54 35 

and, applying the studeniized range and Table 10 7, we find that the 
interval is 

- 1 8 5 J (ttA + ^ 66 5 

Clearly, the t distribution should be used if a single contrast is of interest 
If more than two contrasts are of interest, the two methods are not compara- 
ble, because the confidence coefiioents (when 95 per cent procedures are 
used) are different This is discussed m Sect 10 9 

The studentized range procedure for finding simultaneous confidence 
intervals is very useful when certain contrasts are suggested by the data 
In this way an exploratory collection of data can be used to suggest contrasts 
which may be examined by other techniques in future experiments 

According to Scheff^ 129], simultaneous intervals for any number of 
linear contrasts 7 among k means are given by 

(1084) 

where = {k - \)F,(k — ! p)and s*, e, mi, and «i are defined as before 
Example 10 8. From Example 10 1 we have X, = 92, x, = 75, X, = 106, 
n = 10, and s' = 1457 (a) Use (1084) to find simultaneously 95 per cent 
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confidence intervals for an indefinite number of linear contrasts of Hi, ii^, 
and iii. (b) Compare the results of (a) with Table 10.7. 

Since jFo 5(2, 27) = 3.35, we have 

fV = 6.70 and = 31.2 

Thus, according to 10.84, we are 95 per cent confident of being correct in 
all statements of the form 

2 mXi - 31.2^2 /11? < 2 < 2 + 31.2^2 nil (10.85) 

The intervals (10.85) are to be compared with the intervals (10.83). 

The contrasts of Table 10.7, as well as values of 

c = WiJc, -t- niiXi 4- AM3X3, 31.2V2^» simultaneous 95 per cent confi- 
dence limits for Scheffe’s method, are shown in Table 10.8. We are at least 
95 per cent confident that the contrasts listed in column (1) of Table 10.8 
have values between the limits shown in columns (5) and (6). Further, we are 
95 per cent confident that all the contrasts in Table 10.8 along with as many 
others as we wish to write, have values falling between the limits obtain -' 
by substituting in (10.85). 


Table 10.8 

Confidence Limits of Contrasts of Three Means — Scheffe’s Method 


Contrasts 

(1) 

Value of 

c 

(2) 

2m! 

(3) 

31.2a/ 2 m! 

(4) 

95 Per Cent 
\ Confidence Limits 
Lower Upper 

(5) (6) 


17 

2 

44.1 


61.1 

Ml - Ms 

-14 

2 

44.1 


30.1 

Ms- Ms 

-31 

2 

44.1 

-75.1 

13.1 

(Mi + Mi)l2 — 

-22.5 

8 

T 

38.2 

-61.7 

15.7 

(.Ml + Hs)l2 — H 2 

24 

6 

T 

38.2 

-14.2 

62.2 

(Ms + Ms)I 2 — Ml 

-11.5 

6 

4 

38.2 

-49.7 

26.7 

Mi/3 + 2Mtl3 — Ms 

-25.3 

1 4 

38.9 

-64.2 

13.6 

MiI3 + 2m3I3 — Ms ! 

26.3 

1 4 

38.9 

-12.6 

65.2 

Mil3 + 2miI3 — Ms 

-19.7 

1 4 

38.9 

-58.6 

19.2 

MsI3 + 2msI3 — Ml 

3.7 

IS 

9 

38.9 

-35.2 

42.6 

MsI3 + 2miI3 — Ms 

21.7 

1 4 

38.9 

-17.2 

60.6 

MsI3 + 2msI3 — Ml 

- 6.7 

1 4 

9 

38.9 

-45.6 

32.2 

etc. 







On comparing Tables 10.7 and 10.8, we observe that some intervals e : 
shorter when the range method is applied and some are shorter wh i 
Scheffe s method is applied. But the two methods lead to the same results ^ i 
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the average That is, out of all possible contrasts, 95 per cent of the intervals 
actually do contain the true contrast of population means when the range 
method is applied, and when ScheHe’s method is applied 

Scheffi's method for simultaneously finding confidence intervals can be 
generalized to linear combinations of sample means computed from different 
size samples for which Hi and Jt, {i ^ i') are correlated and have different 
vanances This method for finding intervals ts applicable when the range 
method is applicable, and it ts weak when the range method is weak 
It might appear that considerably more computation is required to obtain 
confidence intervals by Scheffc's method, sin<» the factor appears m 

(10 $5) but not in (10 83) This is misleading We used the form (10 85) for 
finding limits because we wanted to compare the new limits with those for 
the same contrasts which are given in Table 10 7 If (10 85) had been intro* 
duced first, we could have required that V — I in order to simplify the 
computation 

10 6 3 A Method (or fi»<£ng m Sunulroneoui Confidence Infervalt 
Sometimes an experimenter before collecting data selects a number, say 
m, of linear combinations (usually contrasts) among k means which he would 
like to estimate with confidence intervals The range and F methods just 
presented could be used, but it is possible to describe a method, using the 
I distribution, which gives shorter intervals in some instances Dunn [7] 
describes this method, pointing out in the comparison between Scheffi’s 
intervals (or range intervals) and the f intervals that the i method is more 
favorable when, if all other variables eacept one are assumed to be held 
constant, (1) k is increased, or (2) v is increased, or (3) 1 - a is increased 
It should be emphasized that when the ( intervals are used the set of 
linear combinations which are to be estimated must be planned m advance, 
whereas with Scheffc's interval (or range intervals) they may be selected after 
looking at the data Accordingto Dunn, the 100(1 - a) per cent confidence 
intervals for a set of linear combinations selected in advance are given by 

e± (10 86) 

where c, s*. m,, n, are defined as before, the v, s are independently distri- 
buted, and ( IS defined by 

/'_/(,,,)*= 1 (10 87) 

/(I, v) being the density function for a Student I variate with v degrees of 
freedom Values of t for a = 005 and 001 and selected values of v and 
m have been computed by Dunn and are reproduced in Table 10 9 
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Table 10.9* 

Values of /' to be Used with m Linear Combinations 


a = 0.05 


N. V 

m 

5 

7 

10 

12 

15 

20 

24 

30 

40 

OO 

2 

3.17 

2.84 

2.64 

2.56 

2.49 

2.42 

2.39 

2.36 

2.33 

2.24 

3 

3.54 

3.13 

2.87 

2.78 

2.69 

2.61 

2.58 

2.54 

2.50 

2.39 

4 

3.81 

3.34 

3.04 

2.94 

2.84 

2.75 

2.70 

2.66 

2.62 

2.50 

5 

4.04 

3.50 

3.17 

3.06 

2.95 

2.85 

2.80 

2.75 

2.71 

2.58 

6 

4.22 

3.64 

3.28 

3.15 

3.04 

2.93 

2.88 

2.83 

2.78 

2.64 

7 

4.38 

3.76 

3.37 

3.24 

3.11 

3.00 

2.94 

2.89 

2.84 

2.69 

8 

4.53 

3.86 

3.45 

3.31 

3.18 

3.06 

3.00 

2.94 

2.89 

2.74 

9 

4.66 

3.95 

3.52 

3.37 

3.24 

3.11 

3.05 

2.99 

2.93 

2.77 

10 

4.78 

4.03 

3.58 

3.43 

3.29 

3.16 

3.09 

3.03 

2.97 

2.81 

15 

5.25 

4.36 

3.83 

3.65 

3.48 

3.33 

3.26 

3.19 

3.12 

2.94 

20 

5.60 

4.59 

4.01 

3.80 

3.62 

3.46 

3.38 

3.30 

3.23 

3.02 

25 

5.89 

4.78 

4.15 

3.93 

3.74 

3.55 

3.47 

3.39 

3.31 

3.09 

30 

6.15 

4.95 

4.27 

4.04 

3.82 

3.63 

3.54 

3.46 

3.38 

3.15 

35 

6.36 

5.09 

4.37 

4.13 

3.90 

3.70 

3.61 

3.52 

3.43 

3.19 

40 

6.56 

5.21 

4.45 

4.20 

3.97 

3.76 

3.66 

3.57 

3.48 

3.23 

45 

6.70 

5.31 

4.53 

4.26 

4.02 

3.80 

3.70 

3.61 

3.51 

3.26 

50 

6.86 

5.40 

4.59 

4.32 

4.07 

3.85 

3.74 

3.65 

3.55 

3.29 

100 

8.00 

6.08 

5.06 

4.73 

4.42 

4.15 

4.04 

3.90 

3.79 

3.48 

250 

9.68 

7.06 

5.70 

5.27 

4.90 

4.56 

4.4t 

4.2t 

4.1t 

3.72 



This table is reproduced from Olive Jean Dunn, “Multiple Comparisons Among 
Means,” Journal of the American Statistical Association, Vol. 56 (1961), p. 55, Tables 
1 and 2, with permission of the editor of the journal, 
t Obtained by graphical interpolation. 
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Example 10.9. From Example 10 1 we have ^ , = 92, 5 j = 75, JEi = 106, 
n = 10, and j’ = 1457 Use (10 86) to find simultaneously 95 per cent 
confidence intervals for /i, — (tt, ft| — M»t and — n, 

For each of these contrasts 2 “ 2 With m = 3 and »» = 27 we use 

Table 109 to find, by linear interpolation, that 1^5 = 2 56 Therefore, 
/«■»/? 2 '”*/'» ~ 8, and the intervals are 

-258 <Mi - ft« <608 

-57 8</i, - /i,<298 

-74 8<fi, -/t, < 12 8 

It should be observed that these intervals have roughly the same lengths as 
those given by the /"and range methods This is not always the case, since 
the relative lengths depend on the sire ofm as well as k, v, and a 

In this section, Sect 10 8, we have generally restricted our attention to 
confidence-interval problems However, what has been said about simulta- 
neous confidence intervals can easily be extended to simultaneous hypotheses 
involving sets of linear contrasts References are made to such hypotheses 
in Sect 10 9 

In many investigations one is not necessarily interested m estimating 
intervals or in testing hypotheses about some set of orthogonal linear 
contrasts On the other hand, themvestigator often feels, and rightly so. 
that methods for working with all linear combinations are too all-mclusive 
For example, jn an experiment involving four means, say, one ofien requires 
information concerning all pairs That is. for means fit, Mi> the linear 
contrasts p, - m»i Mi -Mi. M« - M». M» - M«. and Mi - Mi are the 

only ones of interest For such problems we ask if it is not possible to 
describe procedures for testing hypotheses and constructing confidence 
intervals which are superior to any already introduced The answer is in the 
affirmative, but the details are not completely resolved 

10 9 MUtTlPlE TEST PROCEDURES FOR SEVERAl PAIRS OP WEANS 

In an analysis of variance, the problem of testing for differences between 
pairs selected from among several means often arises Testing the homoge- 
neity of a set of means by the F test may result in the conclusion that they 
are not all alike, but it fails to signify any arrangement of distinguishable 
groups among the means The problem of separating a group of heteroge 
neous means into subgroups of nonheterogeneous means has been 
approached m several different ways a number of research workers A 
common procedure, actually the oldest one, used for this purpose is the 
least-significant difference test, described by Goulden [8], Davies [3], and 
others There arc other more recent tests which are superior to the least- 
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significant-difference test, commonly called the l.s.d. test. We> shall explain 
in detail how to apply one of the best multiple test procedures and show how 
it is superior to the least-significant-difference test by introducing a discussion 
on errors. Before proceeding with a comparison of the tests, we will state 
the assumptions generally made in applying the tests and give the example 
to be used in illustrating these differences. 

Assume that we have a set of k means x,, x^, ... x*. determined from 
random samples drawn independently from normal populations with 
means /Xi, /Xj, . . . , /x^, respectively. Assume that the sampling distributions 
of the means have a common variance o-J and that is an unbiased 
estimator of cr|. 

For the illustration assume that the mean response of each of eight 
treatments replicated four times in a one-way classification design and 
arranged in increasing order of magnitude are 


Xt 

^1 

^8 

^6 

Xs 

Xs 

X7 

x. 

1.2 

2.7 

3.9 

7.2 

8.2 

10.3 

10.9 

13.1 


( 10 . 88 ) 


The error mean square is 

= 27.66 

I 

with y = 24 d egrees of freedom, and the standard error of means is 
= VtTMJA = 2.63. These means actually represent coded average yields 
in pounds per plot (two rows 25 feet long) of eight varieties of sweet 
potatoes. However, the reader may think of these numbers as coded means 
of tensile strength of eight, types of wire, rainfall in August for eight years, 
expenditure per pupil per year in public schools in eight states, etc. 

If we are interested only in testing to determine whether the difference , 
in the response of treatments 1 and 2 is significant, the best test is the usual 
t test, if it is assumed that these treatments were selected before the experi- 
ment was run. We compute 

^1 _ ^5 ~ .^1 _ 10-3 — 2.7 ^ , 

^x,-xs 3.72 

and compare this value with the two-tailed five per cent t value with 24 
degrees of freedom, that is, with <05(24) = 2.06. Or we may compare 
^2 — X, = 7.6 with 5 j.v^< 05(24) = (2.63)(2.91) = 7.7. In either case we con- 
clude that the difference in the mean response of treatments 1 and 2 is not 
significantly different from zero. If the five per cent level test is satisfactory 
with the experimenter, there is no room for further worry, because the t 
test is the most powerful test, that is, “best test,” in the sense that this test 
will accept the alternative hypothesis more often than any other test when 
the alternative hypothesis is actually true. 

Usually, in an experiment involving eight means, we would be interested 
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intestingseveral(oral!)hypathesesortheform//8 Ui = ~ I. • .8) 

Suppose we want to make a statement about each difference between pairs 
of means and suppose we agree that we want an over-all test corresponding 
to the five per cent test given above For this purpose we will consider the 
multiple I test, the (sd test. Newman Keufs test [15,25]. and Duncan’s 
multiple range test [4] m some detail Test procedures described by Hanley 
[14] and Tukcy [32, 33. 34] are not discussed 

A natural test to consider is provided by the joint application of a level 
/ tests to all hypotheses H, using the rule which follows if 
g, - gj > f„(i») make the decision that p, >/t,, if Xi — Xj< 

s,V2 make the decision that pi = /t,, that is. there is no signif- 
icant difference in ?, and This procedure is often referred to as an a 
level multiple t test In our example, the results of the multiple l test may 
be shown as follows 


X, V, t* X* X, f, S3 

12 27 39 72 82 >03 109 13 1 


Any two means not underseoreJ by the same line are significantly different 
Any two means underscored by the same line are not significantly different 
That ts, in testing each difference between two means, we determine that 

p. >4, and p.>p, 

all other pairs of means being dcchrcd not significantly different 

The multiple t test is not recommended for use but is introduced because 
of Its Simplicity and because some undesirable features of other moitiple 
test procedures arc easy to discuss in terms of this test The principle dis- 
advantage in using this test is brought out in the following discussion In 
drawing a random sample of sire n(n > 2) from a normal population, 
we can expect the difference between the largest and the smallest observations 
to be greater than the difference between two randomly chosen observations 
on the average (This could also be the case for the difference between less 
extreme observations ) Also when kik ^2) independent means .arc com- 
puted from the same normal populilion. we can expect the difference 
between the largest and smallest mean to be greater than the difference 
between two randomly chosen means That is, the dispersion of the sampling 
distribution of differences, d„ between extreme means m a set of k means 
IS greater than the dispersion of the sampling distribution of differences, d„ 
between two random means (The reader should realize that standardized 
values of d, and d, arc actually studentized range variates ) Further, the 
larger the value of k, the larger the dispersion of d, relative to the dispersion 
of dt For this reason it is possible in a case where two extreme means are 
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not significantly different to make the claim that they are significantly 
different when the t distribution (that is, the r/, distribution) is used in- 
correctly. (This can also happen for differences between means which are 
not at the extremes.) This means that if all the means involved are homo- 
geneous, the multiple t test has a large error probability of wrongly rejecting 
the null hypothesis that all the means are equal; that is, //<,: /i, = • • • = /x*. 
For the above five per cent level multiple t test with eight means and 24 
error degrees of freedom, the error probability of wrongly rejecting the 
hypothesis is greater than 40 per cent. The error probability of wrongly 
rejecting the null hypothesis that seven of the means are equal is roughly 
40 per cent; the error probability of wrongly rejecting the hypothesis that 
six of the means are equal is in the neighborhood of 35 per cent, and so on. 
These percentages were found by using Pearson and Hartley’s [27] tables 
of the studentized range. 

The least-significant-difference test was introduced to overcome the 
disadvantage of such a large error probability involving A' means. In apply- 
ing this test, the first step is to use an a level F test to determine if the 
variance ratio for the k means is significant. If there is no significance, then 

is accepted; otherwise, the multiple t test is applied. In our example, 
the F ratio is 2.60 and the upper five per cent point for F with seven and 
24 degrees of freedom is F^i{l, 24) = 2.42. Thus, the eight means are not 
homogeneous, and the multiple t test is used to obtain the results given 
above. 

As has been mentioned, the purpose of the initial F test is to remove 
the high error probability of the multiple / test for wrongly rejecting the 
hypothesis that the eight means are equal. Nevertheless, the l.s.d. test fails 
to insure the reduction of similarly high error probabilities for wrongly 
rejecting the hypothesis that p (p = 3, . . . , 7) of the means are equal. 
However, if this principle of using a preliminary F test were carried to its 
logical conclusion, any test involving a group of means would need to be 
preceded by a preliminary F test of the group. Obviously, such a procedure 
becomes unwieldy. Fortunately, nearly the same results can be achieved 
by replacing the F tests with range tests and doing away with the t test. 

Before we discuss the multiple range tests, it may be useful to give another 
reason why an experimenter who is interested only in testing differences 
in pairs of means would not choose to use an initial F test. The fact that 
the F test also tests the significance of all linear comparisons of the means 
causes trouble in connection with the significance level. For example, 
consider the three means 2.0, 2.1, and 3.0 which are obtained from samples 
of size five. The treatment mean square is 1.52, and the error mean square 
is 5® = 0.41. The F ratio is 3.71 and the upper five per cent level F value 
is ■f.05(2, 12) = 3.89. Thus, we fail to reject the null hypothesis that 
^3:/Xi = /i, = /ij. However, in a similar problem, when the means are 
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2 0, 2 0, and 3 0 with the same enor mean square s’ = 0 41, the treatment 
mean square is 1 67, and the F ratio becomes 4 17 In this case, we reject 
the null hypothesis and conclude that 3 0 is significantly larger than 2 0 
after applying the five per cent level l test Since the error mean square, 
the number of means, and the largest and smallest means are the same 
in both examples, it does not seem right that we should reach two different 
conclusions concerning the same difference It appears that the test of the 
differences 3 0-2 0 in the extreme means should depend on the number of 
means and s This is the situation when multiple range tests are used 
The Ncwman-Keuls multiple range test overcomes some of the dis- 
advantages already encountered This procedure, which was first suggested 
by “Student," developed by Newman (251. amplified by Kculs (151 
equivalent to a multiple t test preceded by several studentized range tests 
Since the I tests of which the multiple f test is composed may be "regarded 
as range tests of subsets of two means each, the over-all procedure is com- 
posed entirely of range tests and may be usefully termed a multiple range 
test 

An a lael Se\^ivnn Keuls muhiple range test is given by the following 
rule 

The difference between any two means in a set of ik means is 
significant, provided the range of each subset which contains 
the given two means is significant according to an a level range 
test We say that a set 5 of numbers contains a number e, 
provided e is not smaller than the smallest number m 5 and 
not larger than the largest number in S 
To apply this test, arrange the means in increasing order of magnitude and 
apply T level (studentized) range tests to all possible combinations of the 
k means taken p at a lime (p * 2, .it) If any combination of p means 
has a nonsignificant range, then the decision is made that these p means are 
homogeneous, and any combination of means within these p means is 
homogeneous For any combination of means which is not homogeneous, 
the highest mean is significantly larger than the lowest mean In our example, 
the five per cent least significant ranges ») = denotes least 

significant studentized ranges and R, = s, denotes the least significant 
range for p means] for v 24 degrees of freedom and samples of size two 
to eight are found with the use of Table IX to be 

^12345678 

292 353 390 417 4 37 4 54 4 68 

R, 77 93 103 110 115 119 123 


Since the over all difference 13 I — 1 2 «= 11 9 of the means in {JO 88) is less 
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Table 10.10* 

Least Significant Studentized Ranges for Duncan’s Multiple Range Test 


a = oTj = 0.05 


y 

2 

3 

4 

5 

6 

7 

8 

9 

1 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

18.0 

2 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

6.09 

3 

4.50 

4.50 

4.50 

4.50 

4.50 

4.50 

4.50 

4.50 

4 

3.93 

4.01 

4.02 

4.02 

4.02 

4.02 

4.02 

4.02 

5 

3.64 

3.74 

3.79 

3.83 

3.83 

3.83 

3.83 

3.83 

6 

3.46 

3.58 

3.64 

3.68 

3.68 

3.68 

3.68 

3.68 

7 

3.35 

3.47 

3.54 

3.58 

3.60 

3.61 

3.61 

3.61 

8 

3.26 

3.39 

3.47 

3.52 

3.55 

3.56 

3.56 

3.56 

9 

3.20 

3.34 

3.41 

3.47 

3.50 

3.52 

3.52 

3.52 

10 

3.15 

3.30 

3.37 

3.43 

3.46 

3.47 

3.47 

3.47 

11 

3.11 

3.27 

3.35 

3.39 

3.43 

3.44 

3.45 

3.46 

12 

3.08 

3.23 

3.33 

3.36 

3.40 

3.42 

3.44 

3.44 

13 

3.06 

3.21 

3.30 

3.35 

3.38 

3.41 

3.42 

3.44 

14 

3.03 

3.18 

3.27 

3.33 

3.37 

3.39 

3.41 . 

3.42 

15 

3.01 

3.16 

3.25 

3.31 

3.36 

3.38 

3.40 

3.42 

16 

3.00 

3.15 

3.23 

3.30 

3.34 

3.37 

3.39 

3.41 

17 

2.98 

3.13 

3.22 

3.28 

3.33 

3.36 

3.38 

3.40 

18 

2.97 

3.12 

3.21 

3.27 

3.32 

3.35 

3.37 

3.39 

19 

2.96 

3.11 

3.19 

3.26 

3.31 

3.35 

3.37 

3.39 

20 

2.95 

3.10 

3.18 

3.25 

3.30 

3.34 

3.36 

3.38 

22 

2.93 

3.08 

3.17 

3.24 

3.29 

3.32 

3.35 

3.37 

24 

2.92 

3.07 

3.15 

3.22 

3.28 

3.31 

3.34 

3.37 

26 

2.91 

3.06 

3.14 

3.21 

3.27 

3.30 

3.34 

3.36 

28 

2.90 

3.04 

3.13 

3.20 

3.26 

3.30 

3.33 

3.35 

30 

2.89 

3.04 

3.12 

3.20 

3.25 

3.29 

3.32 

3.35 

40 

2.86 

3.01 

3.10 

3.17 

3.22 

3.27 

3.30 

3.33 

60 

2.83 

2.98 

3.08 

3.14 

3.20 

3.24 

3.28 

3.31 

100 

2.80 

2.95 

3.05 

3.12 

3.18 

3.22 

3.26 

3.29 

CO 

2.77 

2.92 

3.02 

3.09 

3.15 

3.19 

3.23 

3.26 


* '5 reproduced from David B. Duncan, “Multiple Range and Multiple 

F. Tests, ’ Biometrics, Vol. 11 (1955), pp. 3-4, with permission of the editor of the journal. 
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Table 10.10 

Least Significant Studentized Ranges for Duncan’s Multiple Range Test (conf.) 

q: = OTj = 0.01 
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TMbft 1010 

Lease Significant Studentized Ranges r, for Duncan } Multiple Range Test (coni ) 

a=:«,»001 
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than 12.3, the least significant difference for eight means, we conclude that 
Xt and Xi are not significantly different and that there are no significant 
differences in pairs of means. The results are conviently written as follows 

X4 Xg Xq X5 X2 *^7 _ ^3 

1.2 2.7 3.9 7.2 8.2 10.3 10.9 13.1 


Duncan's multiple range test is applied like the Newman-Keuls test, but 
the least significant ranges are different. If we let denote the significance 
level for two means, the significance level for p (p = 3, . . . , /:) means is 
given by 

ffp = 1 - (1 -ctiY'' 

Thus, for Duncan’s five per cent level test at = 0.05, at = 0.0975, 
Ui = 0.1426, . . . , ffa = 0.3366. That is, in Duncan’s a = at level test 
there is a set of k significant levels. For this reason special tables have been 
computed by Duncan and are reproduced in Table 10.10. 

Example 10.10. Illustrate Duncan’s multiple range test, using the means 
and standard error of (10.88). 

Let a = at = 0.05. Then the cTp (p = 2, . . . , 8) least significant stu- 
dentized ranges rp found in Table 10.10 and the least significant ranges 
Rp = rpSi for the data are as follows 


p 

1 

3 

4 

5 

6 

7 

8 

fp 

2.92 

3.07 

3.15 

3.22 

3.28 

3.31 

3.34 

Rp 

7.7 

8.1 

8.3 

8.5 

8.7 

8.8 

8.8 


The application of these least significant ranges Rp to the differences in 
ordered means of (10.88) is as follows; 

1. For eight means = 8.8 and Xj — x^ = 13.1 — 1.2 = 11.9. Since 

11.9 > 8.8, we conclude that Jc, is significantly larger than Xt or 
Ih > P'4- 

2. For seven means Rt ~ 8.8 and Xs - Xi = 13.1 - 2.7 = 10.4. Since 
10.4 > 8.8, we conclude that Xj is significantly larger than Xj or 
p3 ^ P'1- 

3. For six means i?„ = 8.7, and x, - Xg = 13.1 - 3.9 = 9.2. Since 
9.2 > 8.7, we conclude that Xj is significantly larger than Xg or 
Pa p8- 

4. For five means R, = 8.5 and Xj - Xg = 13.1 - 7.2 = 5.9. Since 

5.9 < 8.5, we conclude that Xg — Xg is not significantly different 
from zero. Further, it follows that the differences Xg — Xg, x, — Xj, 
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j j — ^r. ~ ~ “ ^a- are not 

significantly different from zero Continuing in Ibis way, we obtain 
the results which follow 

5 S, ~ =: 91 :> H, = Zi, thus ft ts significantly larger than 

6 = 8 2 < ^« = 8 7, thus, Xt and X, do not differ significantly, 

and It follows that the differences X^ — Xt. Xt ~ Xi. X, - X, - f,, 
Xi *- J,, — ^i.andJ* ~ Xf, in addition to those six listed in 4, 

are not significant 

7 jf, — ^4 = 9 I > ^6 = 8 7, thus, Xt IS significantly larger than X, 

8 Xt ~ X, ^ 7 0 < fii = 8 5, thus, and ^4 do not differ significantly, 

as well as all pairs in the set X,.XuX»,Xt, X^ 

These results are conveniently shown as follows 

Xi X, X, X, Xt X, X, X, 

12 27 38 72 62 10 3 109 13 1 


That IS, in testing each difference between two means we determine that 
/i, >4«. W. > 1 U. and jX|>fi 4 

all other pairs of means being declared not significantly different 

Tukey has presented two multiple range tests His 1949 five per cent 
test (32j IS like a multiple r lest except that be chooses as his ove^aII least 
significance range the one obtained by fixing the significance level for k 
means at five per cent Theorem 10 I is used with this method In the above 
example, this fixes his “two” mean significance level around 0 5 per cent 
and the “three' mean significance level around one per cent His 1953 test 
[34] fixes the least significant ranges halfway between those for bis 1949 
test and Newman Kculs test A comparison of the least significant studentized 
ranges for eight treatments with 24 degrees of freedom may be made with 
the aid of Table 10 II 

Other multiple test procedures arc described by Hartley [13] and Tukey 
{33] All multiple range tests apply to more general situations Their appli- 

TsMe 10 II 

Least Significant Studentized Ranges for Five Per Cent Level Tests 



2 


4 

5 

6 

7 

8 

Multiple t 

292 

292 

292 

292 

2 92 

2 92 

292 

Duncan 

292 

307 

315 

3 22 

3 28 

3 31 

3 34 

Newman-Kculs 

292 

3 53 

390 

4 17 

4 37 

4 54 

4 68 

Tukey JW3 

380 

411 

429 

4 42 

4 53 

461 

4 68 

Tukey J949 

468 

468 

468 

4 68 

4 68 

4 68 

4 68 
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cations in other (more complicated) analyses are discussed by Duncan 
[5] and Kramer [18]. The effect of sample size has also been examined. 
Harter [11] discusses the selection of appropriate sample size, and Duncan 
[5] and Kramer [17] give methods for applying the multiple range tests when 
samples are of unequal sizes. 

We have emphasized that the multiple range tests are used for testing 
the difference between two means. In many problems, the experimenter is 
interested in testing certain comparisons, in which case either Theorem 10.1 
or a multiple Ftest should be used. A multiple Ftest corresponding to each 
multiple range test can be described, and its application would be similar 
to that of the range test except least significant F values would be used in 
place of least significant range values. Scheffe [29] has described a multiple 
F test corresponding to Tukey’s 1949 range test, and Duncan [4] has de- 
scribed a multiple F test corresponding to his multiple range test. The amount 
of work involved in applying a multiple F test is many times greater than 
that required for the companion range test. This being the case, multiple 
F tests should not, in general, be used, unless the investigator is interested 
in contrasts involving more than two means. 

At this point the reader no doubt wonders which multiple test procedure 
should be used. In general, there is no simple, clear-cut answer. If only one 
contrast is tested, the t test is the best, and if all contrasts are examined, 
either Schcffe’s method, described in Sect. 10.8.2, or Theorem 10.1 should be 
applied. (But neithe of these extremes is often considered appropriate — 
usually an experimenter wishes to test more than one hypothesis but fewer 
than 1000, say.) If one wishes to test simultaneously all hypotheses of the 
form Ha', fii — fij = 0 (i ^ J; i,j = I, . . . , /:) these methods should not 
be used. Of all the tests now available, it appears that Duncan’s new multiple 
range test is the best for testing differences in all pairs of means. However, 
some recent studies by Duncan [6] indicate that there may be a better test 
procedure for testing differences in all pairs. 

One who is interested in theory of multiple-decision problems should 
read what Lehmann [19, 20, 21] has to say. For a study of power for multiple 
tests, see Wine [35]. 

70.70. EXERCISES 

10.17. (a) Find constants a, 6, and c/ so that the two linear combinations 

71 = /*i + Z/ij + afij 7j = 6/i, -f tZ/t, 

are orthogonal contrasts, (b) The three sample totals in Exercise 10.4 
are 

r, = 402 T. = 397 Ta = 458 

Find the treatment sum of squares and the sums of squares for the two 
contrasts in (a); i. e.,'find TSS, Qf and Q\. (c) Suppose samples 1 and 2 
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represent the production of old standard machines and sample 3 repre 
scnts the production of a new machine What linear contrast would 
you use in order to compare production of the new machine with that 
of the standard machines Find a contrast orthogonal to this one 

1018 (a) In an experiment with five treatments there are four orthogonal 
contrails Write a set of multipliers for four such contrasts Describe 
an experiment in which these four contrasts are meaningful (in the 
experimental context) (b) Write two other different sets of multipliers 
for four orthogonal contrasts 

1019 In a completely randomized design with four treatments and seven 
observations per treatment, the error sum of si^uares is 009 and the 
totals are T, = 19 5, T, = 19 2. T, = 20 1 and T, = 19 2 Treatments 
1 and 2 are to be compared to treatments 3 and 4, treatments I and 3 
to 2 and 4, treatments I and 4 to 2 and 3 (a) State the three hypotheses 
in terms of linear contrasts Arc they orthogonaP (b) Compute the 
error and treatment mean squares and the three components of the 
treatment sum of squares with a single degree of freedom (c) Test each 
hypothesis at the five per cent level What is the significance level for 
the expenmenP 

10 20 fn a completely randomized experiment there are it observations for 
each of four treatments O',, Q\ and Q\ are components with an 
individual degree of freedom for three orthogonal linear contrasts 
Prove that 0! + + Cl is equal to the treatment sum of squares 

10.2L Prove Eq (10 73) 

10.22 Prove Eq (10 77) 

Hmi Assume that the n sample values x,i).<eii) .^mi are all 
dilTerent Think of x„) as falling in class (interval) C,. x<,„ ,X(a.|) 

as falling in class C„ and x,„ as falling in class C„ where C„ C„ and C, 
are mutually exclusive and exhaustive classes Then, show that the 
joint distribution of the range iv and smallest value x — x,,, in a sample 

/(w.x) = n[n - I)/(x)/(x + H-)[r(x + w) - /'(x)]"-' 

10 23. Let the rectangular density function be 

/W Ji- 

lo, otherwise 

Show that 

/,{h>) = - w) 

10 24 Find the density function of the range w if x has the density function 
/(x) = c-*.x^0 

10 25 Find the expressions for the density function /.(w) and the distribution 
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function F„(»v) of the standardized random variable W = w/a- if x is 
distributed normally with mean and variance o-^. 

10.26. Prove that when k — 2 the statistic defined by Eq. (10.80) is a t statistic. 

10.27. (a) Use the studentized range and the information in Exercise 10.19 

to test at the five per cent level the hypothesis /Aj = = P'3 = p4- 

(b) Use Theorem 10.1 to find simultaneously 95 per cent confidence 
intervals for the three comparisons defined in Exercise 10.19. Use 
this information to test the three hypotheses in Exercise 10.19. (c) Use 
Scheffe’s method to find simultaneously 95 per cent confidence intervals 
for the three comparisons defined in Exercise 10.19. (d) Use Dunn’s 
method to find the 95 per cent confidence intervals of the comparisons 
of (b) and (c). 

10.28. (a) Use the studentized range and the data in Exercise 10.16 to test at 

the five per cent level the hypothesis p, — = Pj = pi = p5 = Po = Pr- 

(b) Use Theorem 10.1 to find simultaneous 95 per cent confidence 
intervals for five comparisons with multipliers -j. f , — I ; five comparisons 
with multipliers 4. ~ U Rod five comparisons with multipliers 4, ■!> 

—4, — f. (c) Use Scheffe’s method to find simultaneous 95 percent 
confidence intervals for the 15 comparisons of (b). (d) Use Dunn’s 
method to find simultaneous 95 per cent confidence intervals for the 
15 comparisons of (b). 

10.29. Derive Eq. (10.82) from Eq. (10.81). 

10.30. (a) Use the data of Exercise 10.5 to find simultaneous 95 per cent 
confidence intervals for an infinite number of linear contrasts of 
Pi> p2i,p3i and applying Scheffe’s method. Make a table similar to 
Table 10.8 and include at least 15 contrasts, (b) Use Dunn’s method 
to find simultaneous 95 per cent confidence intervals for the contrasts 
listed in (a). 

10.31. The means of six treatments replicated five times in a completely ran- 
domized design are = 70, = 105, ^3 = 125, x^ = 160, jcj = 100, 

As = 190. The error sum of squares is 60,000. (The reader may think 
of the means as resulting from coded data. For example, the obser- 
vations may be such things as weights of six groups of experimental 
animals, heights of cakes using six preparations, lives of six kinds of 
highway surface, yields of dyestuff, thrusts of rocket motors, measure- 
ments made by operators, responses after training, and concentrations 
of solutions.) (a) Use Duncan’s multiple range test to make a pairwise 
ranking of these six means. Let CC; ~ 0.05. (b) Use the Newman-Keuls 
five per cent level multiple range test to make a pairwise ranking of 
these six means, (c) Use Tukey’s 1953 five per cent level multiple test 
procedure to make a pairwise ranking of these six means, (d) Use the 
five per cent multiple t test to rank the six means. 

10.32. Use the data of Exercise 10.5 to test at the five per cent level the sig- 
nificance of the difference between all pairs of means, applying (a) 
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Duncan’s mutdple range test, and (b) Newman-Keufs mulupte range test 
Hint In the multiple test procedure described m Sect 109 all 
samples are the same siae. and the least signilicant range for mans 

IS R, = rySt, where r» is the least significant studentized range and 
St = -/s^fn Since St can be written as 

m^) 

one would expect that, for the case of unequal sample sizes, the standard 
deviation of the mean should be 

Use this last expression in (a) and (b) 
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MORE ABOUT THE ONE-WAY 
CLASSIFICATION 


Fixed effects and random effects models for a one way classification are 
compared Concepts of the one way classification are extended to the nested 
classification Applications for both equal and unequal sample sizes are 
given 

in iwrfiODuaiON 

In Chap 10 we discussed problems relating to several means (or effects) 
in 1 one way classification We used variances in these problems but the 
emphasis was on comparisons of means In many m'csligaiions one is 
primarily concerned with the estimation or testing of variances (or compo 
nents of variance) mcans(orcffEcts)being of secondary interest Wc II stnte 
the difference by considering Example 10 1 

Recall that we compared the mean tensile strengths fi H and /i of 
copper wire (of a certain gauge) of three speafic manufacturers /I B and C 
Ten pieces of wire were randomly selected from A ten from B and ten from 
C The tensile strength of each piece was measured and the sample means 
computed and used in making statements about the three specific population 
means Other manufacturers were not mentioned we were not interested 
m them This is an example of what is called a fixed effects experiment or 
a model 1 experiment However in other situations we might be interested 
in using a sample of three manufacturers to make a slalemeni about all 
manufacturers of this aauge of copper wire For this purpose the three manu 
facturers would be randomly selected from among all manufacturers of 
366 
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;opper wire, and then ten pieces of wire would be randomly selected from 
;ach of the three manufacturers. The three sample means would be com- , 
puted as before, but in this case the investigator is not particularly interested 
in who the manufacturers are; instead, he is interested in making a statement 
about the variability of all population means. This is an example of what 
we call a random effects experiment or a model 11 experiment. The difference 
IS that for model I a repetition of the experiment would necessarily require 
that the same three manufacturers be selected, but for model II a repetition 
of the experiment could (and is likely to) give a different set of manufacturers. 
To summarize, for model I we make inferences about the particular treat- 
ments selected; for model II we make inferences about the population of 
treatments from which a random sample of treatments was drawn. 

In another illustration, if we made an inference about the mean temper- 
ature in August at six specific large cities in the United States, this would 
be a fixed effects experiment; if the six cities were randomly selected, it would 
be a random effects experiment in which an inference would be drawn about 
all cities. As a further illustration, if mean response to five convenient 
(fixed) temperatures are compared, the experiment is model I; if the five 
temperatures are randomly selected from some interval of values, the 
experiment is model II. 


11.2. THE RANDOM EFFECTS EXPERIMENT IN A ONE-WAY CLASSIFICATION 

The model equation for the yth observation in the ith sample of a random 
effects experiment is exactly the same as it is for the fixed effects experiment 
[see Eq. (10.10)]; that is 


Xij — p + a{ + Cij (i = I, . . . , k; y = 1, . . . , «{) (11-1) 

Just as before, p is fixed, and represents the random (error) component 
of observation Xtj. For both model I and II it is assumed that the variance 
of e, is 0 -=; that is, the error variances for the k populations are equal. Further, 
p„ — £(6t) = 0 for each /. The difference in model I and II is in the as- 
sumption about the a,'s. For model I we assume that the cat's are fixed and, 
as a consequence of Eq. (10.11), that 


2 = 0 


But for model II we assume that the cat’s represent a random sample of effects 
from the population of ca’s with mean zero, that is, p^^ ~ E(at) — 0, and 
variance irl. Also, we usually assume that a and et are independently distri- 
buted. (Only in the rare case would we have 
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2«i = 0 

m model 11) For model I we may think of 

I “I 

as a type of variance of the oit’s. since we really are not interested in the 
population of a’s 

In a fixed effects experiment we were interested m specific treatment 
effects a, (i = 1, ,k) Thus, we estimated <X| by single values (ot) and 

intervals and tested the null hypothesis 

//» a, = = CT, = 0 (or 2 “f * 0) 

against the alternative hypothesis 

fU some of the a’s are not zero (or 2 «* 0 ) 

In the random effects experiment we are interested in the variation (or 
variance 9 ^) of the population of <x s Thus we wish to estimate cr’ and 
test the null hypothesis 

//. <ri = 0 ( 112 ) 

against the alternative hypothesis 

//. (113) 

The test of the null hypothesis for the fixed effects experiment was 
established with the aid of the sum of squares identity and the analysis of 
variance Table 10 4 Following arguments similar to those in Sect 103, 


TsMe III 

Analysis of Variance for Random EfTects Experiments in One Way Classificaiions 


Source of \ 
Vonalion 

Squares 

Degrees 0 / 
Ffwrfom 

Sgm^ 

Expected 
Mean Square 

Among means 

271 r- 
“3E W 

k - 1 

4 

+ nff* 

Within 



4 


Toul 


nJt - 1 1 
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we can justify the analysis if variance Table 11.1, which is for samples of 
equal size. (In Sect. 11.3 we actually prove that the expected mean squares 
are as indicated.) 

To test the hypothesis that erl = 0, we refer to Table 11.1. It is obvious 
that jE( 4) is equal to o--, no matter whether the hypothesis is true or false. 
But E(sl) is equal to a-- only when the hypothesis in (1 1.2) is true; otherwise, 
E(sl) is greater than by the amount ntrl. Thus, just as with the fixed effects 
experiment, is distributed as the random variable F with k — 1 and 
k(fi — 1) degrees of freedom, provided that hypothesis (11.2) is, true and 
that JC(j is a random normal variate. If the null hypothesis is false, we expect 
sl/Sp to be larger than unity. Thus, the hypothesis is rejected if 

l,A:(/i- I)] 

Sp 

where Fa[k — 1, k(n — 1)] is the upper oc level value of F with k — I and 
k(n — 1) degrees of freedom. Numerically, the test of o-^ = 0 is exactly like 
the test of the fixed effects hypothesis oTi = • • • = «<,• = 0, but the interpre- 
tations are quite different. 

Table 11.1 can also be used to estimate how much the treatment means 
(or effects) differ; that is, crl can be estimated. The table indicates that si 
is an unbiased estimator of a-- + ml, and 4 is an unbiased estimator of cr®. 
Thus, an unbiased point estimator, si, of <rl is given by 



If (si — sDIn is negative, the estimate si is taken to be zero. 

A confidence interval estimate of a-l would be more useful than a point 
estimate if it could be obtained. Working toward this end we note that 
slKa-^ + nal) and slfcr- are independently distributed as %- per degree of 
freedom with k — 1 and k(n — 1) degrees of freedom, respectively. Thus, 
the ratio 

Sj 

O'” + wq-g 

4 

cr‘ 

is distributed as F with k — 1 and k(n — 1) degrees of freedom; that is 



02 




(11.5) 


or 






( 11 . 6 ) 
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If f , and Ft denote values of the variate F for which 

then 100(1 — «) per cent of the values of F lie in the interval 

Ft<F<Ft (117) 

From (11.5) and (11 7) it foltows that 



and, hence, the 100(1 — a) pet cent confidence interval for the ratio ffi/ir* is 



The limits of (1 1 8) are exact Thus, if «r* is known, the limits 



(118) 


(119) 


of are also exact Unfortunately <r' is not usually known If we replace 
ff* by Sf, the approximate 100(1 - o) per cent confidence limits of aj are 


\3Jf, 




(11 10 ) 


The limits in (11 10) give satisfactory approximations if s, has a reasonably 
large number of degrees of freedom, say 15 or more If the first value in 
(1 1 10) IS negative, the lower limit is set equal to zero 

Bross [4) and other authors 12, 3, S 14, 10 22) have given other methods 
for constructing approximate limits for components of variance The 1 — a 
limits for <ri derived by fiross arc 



(II 11) 


w here P has k — i and o^-> degrees of freedom, and F, and P- are lower and 
upper «/2 points, respectively and ^ is given by Eq (114) The limits given 
by (II 11) purport to he belter approximations than the limits given by 
(11 10), particularly for small numbers of degrees of freedom 

Example 11.1. Thinking of the data in Example 10 1 as being from a 
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random effects experiment, estimate the components of variance <t^ and <rl. 
From the analysis of variance table following ‘Example 10.1, we find 
= 2410 with two degrees of freedom and 4 = 1457 with 27 degrees of 
freedom. An unbiased point estimate of cr- is given by 4 = 1457, and an 
unbiased point estimate of <r\, according to Eq. (11.4), is 


2 _ 2410 - 1457 
10 


= 95.3 


Since 27 4/o-^ is distributed as with 27 degrees of freedom, and 
X =95 = 16.15 and = 40.11, the 90 per cent confidence interval for <r^ 
is given by 


39,347 .39,347 

40.11 ^ 16.15 


or 

981.7 < < 2436.4 


Further, since F, = F^, 27) l/Fo5(27, 2) = 1/19.46 = 0.05139 and 

Fj = Fo 5(2, 27) = 3.354, the 90 per cent confidence interval for o-^, when 
(11.10) is used, is given by 


or 


2410 

.(1457)(3.354) 


1457 2410 ,11457 

10 L(1457)(0.05139) J 10 


-73.9 < <r|< 4544 

Replacing the negative lower limit by zero, we obtain , 


0 < 0-® < 4544 

For the limits given by Bross we find F'^ = F_^i{l, oo) = 3.00 and 
= 1 /F.o5(oo, 2) = 1/19.5. Thus, by (11.11) 


0 < o-=„ < 4544 

The limits given by (11.10) and (11.11) are the same to four significant 
figures, but differ in the fifth. In this case, it is clear that the simpler Formula 
(11.10) should be applied. 

The power of the F test of the hypothesis o-^ = 0 may be determined 
numerically as it is in Chap. 9. That is, for n, = • • • = n„ = n we find 
power as a function of where 


X= = 


a- -4- ncr„ 


- 1 + «- 


( 11 . 12 ) 


Since it is fairly straightforward, we leave it to the reader to determine the 
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power function for a particular test and to find the smallest equal-sized 
samples one can use for specified values of a, i9, and crj, 


n 3 EXPECTED MEAN SQUARES 


The proofs of this section are presented primarily for two reasons We 
wish to show that the expected mean squares m Table 1 1 1 are correct and 
to introduce a method of proof which ts easily extended to more complicated 
analysis of variance experiments It is also informative to learn where and 
in what order the assumptions are introduced and to see what role each 
assumption plays in the derivation We use a detailed proof to show that 
£(5’) =:«■’ + niri Then we indicate how the method of proof can be used 
to show that 

£(j«) = + -j-ilT 

for model I and £(j|) = <r’ for both mode] I and model 11 experiments in 
a one-way classification The method is also applied for samples of unequal 
size 

First, we note that the sum of squares identity is an algebraic relation- 
ship which results from the use of algebra, it in no way depends upon the 
distribution assumptions associated with the efieers Thus since both the 
fixed elTecls and random elTects experiments have the same model equation 

e /A or. + f,; ( 1 = 1 . .A:.;=l. ,«.) (1113) 


It follows that their sum of squares identities are identical For the case 
where «( a ••• 5= n, = r, the computing form ofthe sum of squares identity 
1$ 






and the mean squares for among means and within samples are given by 


S’-; j, 
« 


271 

(1115) 

- %k-l/ 


respectively 

In the random effects experiment wc assume that k populations arc ran- 
domly selected and then within each of these populations n random obser- 
vations are made, the selection prodedure depending in no way upon the 
particular population Thus, the effects 
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<Xi, . . . , OCk', Filj • • • J Clnj 6 ai> • • • > ^kn 

are random variables which are independently distributed. Therefore, it 
follows that, since we assume that a is distributed with mean = 0 and 
variance crl and that ej is independently distributed with — fi, = 0 
and variance cr^, = a", we can write 

E{ai) = 11^-0 and Eieu) = |l.^J = fi, - 0 (11.16) 

E[ai — = -£’(«?) = o’® (common for ail Ui) (11.17) 

- £(eij)]= - £(6?j) = (common for all 6,^) (11.18) 

and 

cov(ai, at) = £{[a, - E{a.i)][ai- - £(«()]] = «(•) = 0 (i = /') 

■ cov (ejj, e,.;.) = E{eij, 6, j.) = 0 V if j - j' or j^f if i = /') 

, cov (ffi-, 6,j) = £(«(., eij) = 0 (/ = /' or I /') 

(11.19) 

The expressions in (1 1.19) follow from the fact that any two different effects 
are independently distributed. The normality assumption is not required 
for any of the derivations of this section. However, it is required for tests 
of hypotheses and the construction of confidence intervals. 

From 


£(5D = E[ 


I'l.Ti 


k - 1 


we obtain, using properties of expectation 

E{s\) = {-^[£(£ 1 .) + • • • + £(n.)] - ^£(7':.)) (1 1.20) 

Now, for the ith population, we may write 

£(7'i.) = £(a:(i +•••-(- A'jn)- 

= E[nii + + (e„ + ... + e(„)]= 

by (11.13). 

Using properties of algebra and expectation on this last expression, we 
have 


£(£?.) = + «=£(«?) + £(ei, 4- • - • + e<„)= -t- 2n-liE(ai) 

4- 2/;M£(en 4- • ■ • 4- e,„) + lnE[ai{en + • ■ • 4- 6,,,)] 
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We may write Eq (U 21) as 

£(n) = nV + + n<r* 0=1. .,k) (1122) 

since, by Eqs (11 16). (11 17). (11 18), and (11 19) we have 
£(«..+ + e,.)’ = £(*!i) + +£(ef.) 

+ 2£(«ii«h) + • + 2£(ei %.iein) 

= ff* -+■ + + 2’0 -f- • -f-2’0 = rw* 

£(«,) = 0. £(«)) = «ri 

£(*<■+ +«.,)=£(*„)+ +£(«„) = Oh- -1-0 = 0 

and 

£(flfi(«ii + + fi,)} = £vot|(ii) + £(ff|e(s) = 0 

Further, since the right hand side of Eq (11 22) does not depend on i, 
we have 

£(rj ) -I- + £(71 ) = it«V + kn'al -t- Aw’ (11 23) 

In order to evaluate Eq (II 20), it only remains to obtain £(1*) Thus 
£(7^) = £(r, -t- +ro’ 

= £{(«M + nat + («|I + +»!»))+ + ["M 

4- + («„ + + 

= £tA«^i + w({X, + + a») 4- (e„ -f f ft,)]* 

Of 


£(7^ ) = + «’£(«, + + «»)’ + £(«„ -f + €h,)* 

+ 2An’/i£(<t, -I- -I- a») + 2AnM£(«ii + + «»i) 

+ 2n£(a, + + flf»)(«ii + + e»0 


£(7^) = k^n’n* + kn*oi + Aw’ 


£(tfi -f- +■ a,y = £(af) + + £<«,’) + 2E(a,<Xt) 

+ -1- 2£(a»_,o») 

= 0^-1- -l-«^-l-2*0+ +2'0 = Aff’ 

£(«ii + + e»0' = krur' 

£(«, + + «*) = 0 
£(«» + -1- €*0 = 0 
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and 

EiUi • • • -i- Ctic)ien + + etn) = E{aieu) -f • • • -f E{oCieicn) 

+ • • • 4- E((Xkeii) Eidne/^T,) = 0 

Substituting Eqs. (11.23) and (11.25) in Eq. (11.20) gives 

E{sl) = -I- knai 4- /c<r=) - (/c«/i-“ 4- iiarl 4- cr^)] 

= - n)a-l + {k - 1)0-'] 


or 

Eist} = ni7-% 4- o-^ (11.26) 


Next, we give an outline of the derivation of E(sl) for the fixed effects 
experiment. The only properties different from those used in proving Eq. 
(11.26) are as follows 


Change from 
at random 
E(aO = 0 
£(a1) = 

E(cai) = 0 


To 

oCi fixed 
E(af) = ai 
Eioi]) = a\ 
E(cat) = coci 


where c is a constant. Then Eq. (11.21) becomes 


E(Tl) = 11^ 4- nrcfi 4- na-"^ 4- (/ = 1, . . . , A:) 

and since 2 = 0, Eq. (11.24) becomes 


(11.27) 


E{Tl) = k^n^-li- + knir- (11.28) 

Substituting Eqs. (11.27) and (11.28) in Eq. (11.20) gives 

£(.4) ~ j —j ; ^ [(knii^ 4- n 2 <^< + 4* 2nfi 2 ~ (.kn[i- + <r-)] 

or 

E(si) = 0 -= 4- since '^ai = 0 (11.29) 

To find the expected mean square for s- we first prove that 

— I + ” 2 4- nka-^, for model I 

^ ‘ j 1 4- nk<rl 4- nka", for model II 

Then we substitute Eq. (11.30) and Eq. (11.23) or Eq. (11.27) in 


(11.30) 
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EUl) = «r’ 

When the samples arc of unequal s«e, it is easy to show for model II that 




/T1\ . 


and, therefore, on substitution that 


£(,■) = ,. + („ . 


For model I the expected mean square for treatments depends on the restric- 
tion among the ai's It can be shown that 


<r‘ + 


. (2 (" 52 ) 

— -j — - — , when 2 ®i “ ® 


Note that the latter expression for £1^ can be written as 


With a little practice the reader should discover certain shortcuts to 
the above method For example, notice that only the square terms /i’, at, tf, 
contribute nonzero values to the expected mean square Thus, from an 
expression in fi, a, and e„ we can read off the expected mean square almost 
immediately— bearing m mind, of course, all the assumptions of the particu- 
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lar model being considered. In the future, we give the expected mean squares 
in the analysis of variance table without presenting the derivation. 

n.4. SUBGROUPS WITHIN A ONE-WAY CLASSIFICATION 

The simplest design for an experiment is the one-way classification de- 
scribed in some detail in this chapter and Chap. 10. This design is also 
referred to as a completely randomized design. Sometimes it is desirable to 
subsample in a completely randomized design, and a more refined analysis 
becomes necessary. For an illustration (Example 11.2) of this new design, 
consider an experiment described by Davies [II, 12]. 

Example 11.2. Large batches of a chemical paste, regularly produced, 
are placed in casks for deliveries. Three casks in each batch are selected at 
random, and a sample is taken for analysis. Two independent analytical 
tests are carried out on a part of each sample to determine the paste strength. 
The data resulting from ten random batches are given in Table 11.2. The 
problem is to estimate the batch-to-batch variance a-i, the cask-to-cask 
variance crl, and the variance cr~ resulting from the analytical tests, or to 
test hypotheses about al or o-|. Before describing the general approach, we 
give calculations for this equal-size sample variance components experiment. 


Table 11.2 

Percentage of Paste Strength of Samples 


Batch 

Cask 1 

Cask 2 

Cask 3 

Total 

Observations 

Total 

Observations 

Total 

Observations 

Total 

Batch 

I 

62.8 

62.6 

wSEi 

60.1 

62.3 

122.4 

62.7 

63.1 


373.6 

2 

60.0 

61.4 


57.5 

56.9 

114.4 

61.1 

58.9 

120.0 

355.8 

3 

58.7 

57.5 

mm 

63.9 

63.1 


65.4 

63.7 

129.1 

372.3 

4 

57.1 

56.4 

Its 

56.9 

58.6 

115.5 

64.7 

64.5 

129.2 

358.2 

5 

55.1 

55.1 

110.2 

54.7 

54.2 


58.8 

57.5 

116.3 

335.4 

6 

63.4 

64.9 

128.3 

59.3 

58.1 

117.4 

60.5 

60.0 

120.5 

366.2 

7 

62.5 

62.6 

125.1 

61.0 

58.7 

119.7 

56.9 

57.7 


359.4 

8 

59.2 

59.4 

118.6 

65.2 

66.0 

131.2 

64.8 

64.1 

1 

378.7 

9 

54.8 

54.8 

109.6 

64.0 

64.0 

128.0 

57.7 

56.8 

■ H9 

352.1 

10 

58.3 

59.3 

117.6 

59.2 

59.2 

118.4 

58.9 

56.6 

115.5 

351.5 


Grand Total 3603.2 


The sums of squares (corrected) and degrees of freedom for the analysis 
of variance are shown in Table 11.3. They are computed from the raw data 
and totals by methods similar to those described in Sect. 10.5. Thus, the 
uncorrected sums of squares are given by 

Observation SS = (62.8)^ + (62.6)^ +•••-(- (56.6)= = 217,002.82 
Casks SS = [(125.4)= + (122.4)= + ■■■ + (115.5)=]/2 = 216,982.48 
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Batch SS = 1(373 6)> + (355 8)* + + (351 5)']/6 = 216,631 57 

Grand total SS = (3603 2)760 = 216,384 17 
and the corrected sums of squares by 

Between batches SS == 216.631 57 - 216,384 17 = 247 40 
Between casks withm batches SS = 216,982 48 - 216,631 57 = 350 91 
Between observations within casks SS~ 217,002 82 — 2(6,982 48 = 2034 
Total SS = 217.002 82 - 216,384 17 = 618 65 

The expected mean squares may be obtained by the methods of Sect 11 3 
The general expressions are s>vcn later in this section, and it is left as an 
exercise for the reader lo justify these results 


TatOe IIJ 

An JlyJis of Variance for Paste Strength 


Souree of yano'ion 1 

1 Som of 
\ Souores 

Dfgtret of 1 



Balchn 1 

U140 

9 1 



Casks within batches 

JJ091 

20 1 



Ohrervations within caiki 

20» , 

30 1 



Total 

msM 

59 1 

1 i 



From Table 11 3 it is clear that point estimates si, sj, and of the 
components of variance <r* <rj and «t* are. respectively 

sj a 0 68 (error variance) 

s! as as 8 44 (vartance among casks) (1134) 

, 27 49 - 17 55 , t ^ 

g = I 66 (variance among batches) 

The standard deviations arc s, = 0 82, = 2 91. and j* = I 29 Note that 

the cask to cask variation « by far the largest 

If confidence intervals are required, we use the chi square distribution 
or the methods introduced m Sect II 2, The 0 975 and 0025 percentage 

pointsofthcx* tl'slt'hutionfor 30degrecsof freedomai-egi.venby Xi = 1679 

and 7 ^ = 46 9S respectively Thus, the 95 per cent confidence interval of 
<r* IS given by 

20 34 . . ,2034 
46 98 ^ "^1679 

or 


0433<<r*< 1 21 
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and the 95 per cent confidence interval of tr by 

0.66 <o-< 1.10 (11.35) 

To calculate approximate confidence limits of a-l we use Eq. (11.10). Since 
the lower and upper percentage points for E with 20 and 30 degrees of freedom 
are Fi = 1/2.349 and Fj = 2.195, the approximate 95 per cent confidence 
limits of a-l are given by 


and 


17.55 

2.195 


- 0.68 


2 


3.66 


(17.55)(2.349) - 0.68 ^ 20.27 


Thus the approximate 95 per cent confidence interval of o-g is 

1.91<o-s<4.50 (11.36) 


A similar method is used to derive approximate confidence limits of <r^ 
(or o-J from the confidence limits for crl/crl. Since the lower and upper 
percentage points for F with nine and 20 degrees of freedom are Fj = 1/3.667 
and F 4 = 2.837, the approximate 95 per cent confidence limits of <rl are given 
by 


and 


27.49 

2.837 


6 


17.55 


-1.31 


(27.49)(3.667) - 17.55 
6 


13.88 


Thus the approximate 95 per cent confidence interval of cr„ is 


0<o-„< 3.73 (11.37) 

since a negative variance or standard deviation is meaningless in this context. 
Note that the interval which estimates o-^ is much longer t^ian the intervals 
which estimate o-g and cr. 

To test the hypothesis that there is no variation from cask to cask except 
that due to analytical tests, we compare the ratio 17.55/0.68 = 25.81 with 
1.93, the upper five per cent F value based on 20 and 30 degrees of freedom. 
/ This leads us to reject the hypothesis and conclude that there is real cask- 
to-cask variation. That is, we conclude that the variation from cask-to-cask 
is actually due to cask variance <ri as well as to error variance a--. To test 
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the null hypothesis that o-l, = 0 we compare the ratio 27 49/17 55 = 1 57 
with 2 39, the upper five per cent f value based on nine and 20 degrees of 
freedom Smce 1 57 < 2 39, we fail to rqecl oi = 0 That is, we do not 
have enough evidence to say that the apparent variation from batch to batch 
IS due to components other than cask variance <r| and error variance o-’ 

The data of Table J] 2 were used to illustrate the various techniques 
The reader should realize that normally not all the above procedures would 
be used with a single experiment Further, it should be pointed out that 
the variance component estimates may be used in the future to determine 
what sampling plan to apply when the cost of sampling and testing are 
known The student is referred to Cochran {6J. Hansen {15] ^nd others 
{1, 17, 21] for more details on selecting the optimum sampling plan 

The magnitude of the observations shown m Table 11 2 results from 
the following three sources of vartation (1) batch. (2) cask within batch, 
and (3) observation in percentage paste strength wiihm casks In other 
illustrations the three terms “batch.*’ "cask." and “observation m percentage 
paste strength" may be replaced, respectively, by such terms as 

1 Chalks, laboratories, determinations of bulk density 

2 Storage times, treatments, determinations of ascorbic acid in frozen 
strawberries 

3 Growers, bales, determinations of percentage of foreign matter m 
cotton 

4 Days sheets of building material determinations of permeabilities 

5 Cities blocks, determinations of response to a question 

Each of the above may serve as an illustration of an experimental design 
known as a nested classification or a hierarchical classification or subsampling 
Hithi/i a one-Hoy clasification which has the model equation 


+ CCi + Si} + ti„ 

,=rl. ,A /=!, 


,nt 0=1, .riij 


(1138) 


where x„„ denotes the nth determination within the /th sample within the 
ith population, denotes a constant over-all mean cfi denotes the effect 
(constant or random) of the ith population. S„ denotes the effect (constant 
or random) of {he jth sample within the rth population, and denotes the 
uth random effect within the yih sample within the nh population We often 
term a,'s the treatment effects. Si/s the experimental effects and ru^’s the 
subsample effects in order to dislingiush among the effects 

If we are interested only m finding unbiased point estimates of fixed 
effects, then all we need assume is that the observations are randomly and 
independently distributed In order to establish confidence intervals for the 
effects and variance components or to make tests of hypotheses m the usual 
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way, we need the following additional assumptions. If the a, are fixed, then 
we measure them as deviations from /i such that 

1=1 j=i 

If the «( are random, they are assumed to be from a normal population with 
mean zero and variance cr^. If the S.j are fixed, then we measure them as 
deviations from n + at such that 


2 ihAi = 0 

if the 8ij are random, they are assumed to he normally and independently 
distributed with means zero and common variance a-t The etju are assumed 
to be normally and independently distributed with means zero and common 
variance Further, all random effects are independently distributed. 
For example, if the a, are fixed, whereas the 8,j and are random, then 
drawing a particular value of 8 does not affect the probability of drawing 
a particular value of e. This design for subsampling can be extended indefi- 
nitely, and the model equation (1 1.38) and associated assumptions generalize 
in a straightforward fashion. The reader should note that there are 

K 

1 + A: -1- 2 

1 = 1 

populations associated with the model equation (11.38). 

For a collection of data with model equation (11.38), it can be shown 
that the sum of squares identity is 

*■ ni Vit 

I J u i } V 


where 

r = 2 2 nu T'o = 2 

J I j li 

7-, 2 = ^ („.40) 

i-. =11- and .v = X . 

Hi n.. 

The computing form of the identity (1 1.39) is 


- x)= + 2 2 2 - Xi.y 

I ; U 

+ 222 )' 
t j n 

(11.39) 
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+ ( 222 --.- 22 ^) 

The degrees of freedom identity associated with Eq (1 1 39) or Eq (1 1 41) is 

n - 1 = (fc — 1) + (2 »! — *) + - 2 ”j) (1* ^2) 


Note that n, ^ «, The symbol n, denotes the number of experimental units 
in the jth population, but n, denotes the total number of subsample units 
in the ith population Further 


i". 

denotes the total number of experimental units in the experiment 

As with the one way classification, the model equation, along with the 
assumptions which follow, may be used to determine the expectations of 
the three component sum of squares on the right-hand side of Eq (1 1 41) 
These expectations, together with the degrees of freedom identity and 
theorems in Chaps 7 and 9 may be applied to determine the nature of the 
distributions of the component sum of squares as well as the distributions 
of their ratios The expected mean squares for four experimental models in 
a nested classification in an analysis of variance are given m Table 11 4 
(for unequal samples) and Table 1 1 5 (for equal samples) In all models 
fi IS fixed In addition to this the a, and are fixed in the fixed model 
[a, 5), only the a, arc fixed in the mtxed model (a), only the are fixed 
in the mixed model (£), and no effects are fixed in the random model 

In the two models where the oCi are fixed, the unbiased estimates of the 
population mean 

Ml = M + tfi 0=1. . fc) 

IS given by x, , and the unbiased estimate of the effect a, is given by 
o, = — X 

The procedure for testing the hypotheus 

Jff /j,. = (or eff = /JJ 43,1 

when ai are fixed, the hypothesis 

«,<T*=0 (1144) 


when a, are random, and the method of estimating the components of 
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variance may be explained in terms of the expected mean squares shown 
in Tables 11.4 and 11.5. (It is left as an exercise for the student to verify 
these expected mean squares.) 


Table lt.4 

Analysis of Variance and Expected Mean Squares 
for Nested Classifications with Unequal Samples 


Source of 
Variation 

Sum of Squares 

Degrees 

of 

Freedom 

Mean 

Square 

Expected 
Mean Square 
for Fixed 
Model (a, 5) 

Among population 
means 

T- 

k I 


2«i4 

<r-+ * 

't' n, n.. 



k-\ 

Subsamples within 
population 
(experimental error) 

»t'2 <7'2 

1 j n,j 1 Hi 

2«i 

t 

4 

2 2 ”i 
^Oi-k 

Observations within 
subsamples 
(sampling error) 

2224.-22^ 

i ) u i j fiij 

n.. - 2 tii 

4 


Total 

2224«-J^ 

i j n 

- 1 




Source of 
Variation 

Degrees 

of 

Freedom 

Mean 

Square 

Expected Mean Square for 

Mixed Model 
(a) 

Mixed Model 
(8) 

Random 

Model 

Population 

k-l 

■■■ 

. , 2 2”< 


ir- + 6(r|+c4 

means 


<f- + C(rl 

Experimental 

error 

2 "i - ^ 
t 

H 

0-’ + a<rl 

22«(4 

tf^ + airl 


a -T ^ r 

Sampling 

/!.. - 


cr- 



error 





Total 

//.. - 1 

Hi 





where 


n.. - 2 


«i 


y - A' 


% ^ » i 


n.. 


(11.45) 


k - 1 

24 


n.. — 


c = 




1 



















Degrees 


1 Especred Afeaa Sguare/or 

Variation 

of 

Freedom 


Mixed Model 
(•) 

Mixed Model 
(») 

Handom 

Model 

fbpufadon 






means 

k — 1 


» +«(•»» 

t’+rel+ni 

Experimeniat 

A{rt- 1) 

r. 


rVV8„ 

1 ’ ■*' klH-l) 

a'+rel 

Sampling 

kn(r — I) 
L_; 

4 




Total 

1 knr — 1 



1 

unbiased point estimators s*, 

, A<, and of the components a 

<r|, and 

respectively, a 

ire given by 







SECT. 11.4. 


MORE ABOUT THE ONE-WAY CLASSIFICATION 


385 






(11.47) 


In the rare case where the Si/s are fixed and the oCiS are random, the 
estimators of cr^ and tri are given by 



(11.48) 


Confidence limits for these components may be found by the methods de- 
scribed in Sect. 1 1.2 and illustrated earlier in this section. Point and interval 
estimates of the population means may be found in a straightforward way. 

Now we consider the general case where a ^ b. If the Si/s are fixed, 
exact tests of Eqs. (1 1.43) and (1 1.44) apply, but when the S,j’s are random, 
approximate tests must be found. The reader who is interested in details of 
such tests is referred to [8, 9, 10, 13, 17, 19, 20]. Some of the complications 
in computations, tests, and estimation in the case where the sample sizes 
are unequal are considered in the following example. 

Example 11.3. Think of the tensile strength experiment first described in 
Example 9.2, in which ten random observations were made for each of three 
manufacturers of copper wire. Now suppose that rolls of wire are randomly 
selected from the manufacturers so that three come from A, three from B, 
and two from C. Further, suppose that the number of random determinations 


Table 11.6 

Tensile Strength of Copper Wire in Pounds (Coded) 


Manufacturer 

A 

B 

C 

Roll 

1 

2 

3 

1 

2 

3 

j 1 

2 

Measurement 

110 

130 

50 

130 

45 

120 

100 

130 


90 

115 

75 

45 

55 

50 

200 

80 


120 

105 

85 

50 

65 

150 

90 

70 




40 

40 



70 

80 








90 

150 

Totals j 

(subsamples) ] 

j 320 

350 

250 

1 265 

165 

320 

550 

510 

Totals (samples) 1 

920 1 

1 ! 

1060 


Grand Total 2730 
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of tensile strength made on each roll of wire for a given manufacturer is 
proportional to the size of the roll In this case, the data may be classified 
as shown in Table 1 1 6 

The reader, no doubt, is already asking such questions as, “Isn't it going 
to be difficult to get random measurements from a roll of wire’ ‘ “Isn’t this 
wasteful and time-consuming’^ The answer to each question is “yes ** But 
if such an experiment is desirable and the above suggested analysis is con- 
sidered appropriate, the observations must be made in such a way that each 
of the eight subsamples can be consideied random We do not dwell on 
the practical problem of selecting random subsamples, since the primary 
purpose in introducing this example was to illustrate the techniques of 
analysis for the nested classifications when the samples and subsamples are 
unequal in size The reader might jusi as well supply his own variables of 
classification (Some of the cases listed immediately before model equation 
(1 1 38) may seem more appropriate ) 

The numbers associated with the rolls have no special significance They 
are used only to distinguish between rolls We might just as well have 
numbered the rolls 1, 2. 3, 4, 5. 6. 7, 8 or have used any other notation which 
differentiates the eight rolls 

Calculations of the sums of squares are complicated by the presence of 
different divisors, and the coefficients for the expected mean squares require 
extra calculations For the data in Table II 6 we obtain 


22 HO’ + 90*+ + 150* = 292,600 

'9 -S' liL - 320' .350’ ^ 510’ _ a,, 

?^r''T(r + -ra- + -To- = 


r’ _ 2730’ 
n ST 


= 248.430 


b = 


' (w -TO tt) , 

g.— 

(JL+ +^)- 3L± ±J.’ 

\\0 ^ * lo) 30 

J-I 


and the analysis of variance shown m Table 1 1 7 

Clearly, there is no difficulty in finding point and interval estimates of 
the a’s, in finding point estimates of the components <r’ and tfi, and m test- 
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Table 11.7 

Analysis of Variance for the Nested Data of Table 1 1.6 


Source of 
Variation 

Sum of 
Squares 

Degrees of 
Freedom 

Mean 

Square 

Expected Mean Square 
for Mixed Model (a) 

Manufacturers 

4,820 

2 

2410 

<r2 + 3.934 + 5 2 
) 

Rolls within 
manufacturers 

10,626 

5 

2125 

+ 3.64<r| 

Sampling within 
rolls 

28,724 

22 

1306 

9 

tr- 

Total 

44,170 

29 




ing the hypothesis er| = 0. However, in order to test the hypothesis 
o;, = rtj = oTj = 0, we must use an approximate test, since a^b. We now 
present an approximate test due to Satterthwaite [18], 

First note that 



(11.49) 


is an unbiased estimator of -f b<3%, which is the expected value of jf when 
the null hypothesis is true. Thus, if we knew the number of degrees of freedom 
to associate with (11.49), we would expect 

to be approximately distributed as some F ratio when the null hypothesis 
is true. Actually, if we let 


A A2 r/ A \ 12 


A" 

'( b .\ _2 

1 ~^2 ) 

1 llj'^ 


L\o / J 




it can be shown that (11.50) is approximately distributed as F with v, = 1 
and v, = z) degrees of freedom. 

In our example, bja = 3.93/3.64 = 1.08 and bfa - I = -0.08. Thus 


_ [(1.08)(2125) - (0.08)(1306)? 
[(1.08)(2125)1^ , [(0.08)(1306)]= 
5 22 


(2,190.52)"- 
1,053,901.18 ~ 


4.55 
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and (II 50) becomes 


24iy , ,0 

(1 0J)(21J5) - (008)(ll0<i) 

Since the five per cent upper /"value vath one and five degrees of freedom 
IS 6 61, we fail to reject the null hypothesis that the manufacturers make 
the same strength of copper wire (Of course, this is obvious on examination 
of Table 117 \Vc made the calculations only for illustrative purposes ) 
TTie calculations in Example 1 1 3 should make clear the need for equal 
sue samples and equal size subsamples Thus, in planning an experiment 
every effort should be made to keep (he sample sizes equal 

II 5 EXCRClSeS 

11 1 The analysis of variance for a random model experiment m a one way 
classification is as shown in Table 118 (a) What variance is used as 
a measure of dispersion of population means'^ Find a point estimate 
of this variance Find a 9S per cent confidence interval estimate of this 


TaMe J14 






Expected Mean 
Square 

Among means 

392 ! 


1 98 1 

f* + 4»i 

Within ' 

330 

in 

1 ^ 1 


Total 1 


QiZ] 

1 1 



variance What assumptions are required for each of these estimates? 
(b) What null hypothesis can one test with the above analysis of van 
ance’ What is the alternative hypothesis'* Test the null hypothesis 
slating assumptions and conclusion (c) Describe an experiment which 
could lead to the above analysis of variance State the conclusion for 
your experiment using (he results of (b) 

II 2 An experiment was run to determine the effect of firing temperature 
on the density of bricks Five firing temperatures were selected at ran 
dom and 12 bricks were used for each temperature The sum of squares 
for among firing tempcraiur*^*' was 6 13, and the pooled variance 
among bricks within finhg lcmi>erature was 0 64 (a) Estimate in an 
appropriate manner/ (he effect finng temperature on the density 
of bricks Discuss yoi^ results (b) Ijjc two methods to find approximate 
95 per cent confidenp intervals standard deviation among 

temperatures (c) In^ddiUon to the information suppose it is 
known that the me^j density for th bricks is 2 3 (a coded value) 
Approximately v(' iroportion of would have density between 
19and27’ 
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11.3. The following hypothetical data are for a random effects experiment 
in a one-way classification 




Treatment 



1 

2 

3 

4 

5 

6 

51 

44 

53 

60 

69 

55 

47 

51 

35 

39 

59 

72 

49 

49 

43 

49 

57 

41 

68 

43 

58 

60 

51 

54 


(a) Prepare an analysis of variance table with expected mean squares. 

(b) Test the null hypothesis = 0 against the alternative hypothesis 
a-\ > 0, giving the assumptions and conclusions, (c) Graph the power 
curve for the test in (b). Discuss the use of the curve as it relates to this 
problem, (d) Find 90 per cent confidence limits for o-^ by any method. 

11.4. Prove Eq. (11.30) and then = <r-. 

11.5. Prove Eq. (11.31). 

11.6. Prove Eq. (11.32). 

11.7. Use the information in Exercise 10.5 to write the expected mean squares 
for both the fixed and random effects experiments in a one-way clas- 
sification. 

11.8. In a random effects experiment in a one-way classification, assume 
Ml = 3, M, = 6, Mj = 5, and — 5 with the analysis of variance given 
in Table 11.9. (a) Complete the analysis of variance table, (b) Find a 
point estimate of tr^. (c) Find, if possible, a 95 per cent confidence interval 
for o-a. 


Table 11.9 


Source of 

Sum of 

Degrees of 

Mean 

Variation 

Squares 

Freedom 

Square 

Treatments 



53 

Within 


1 


Total 

434 




11.9. Derive the expected mean squares for batches in Table 11.3. 

11.10. The coded data for two determinations of ascorbic acid in frozen straw- 
berries resulting from two storage limes S, and 5, and three treatments 
T,,T,,andT, within each storage time are as shown in Table 11.10. 
(a) Prepare an analysis of variance table similar to Table 1 1.3, assuming 
a fixed effects model, (b) Find unbiased estimates of effects of storage 
time and treatments within storage time, (c) Slate and test hypotheses 
about storage time and treatments, (d) Find 90 per cent confidence 
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Storage liroe 


s, 



s, 


Treatments 

r, 

Tt 


T, 

T, 

Ti 



10 

n 

12 

14 


Determinations 

12 

9 

11 

IS 

12 

IS 


inlervals tor the two storage lime means and for the six treatments 
within storage time means (e) Assuming the data to be for a random 
cfTecls model, state and test hypotheses about storage time and treat- 
ments within storage time (f) Find 90 per cent confidence intervals 
for the three variance components involved in (e) 

11.11. From each of five mixes four samples are randomly drawn with three 
random subsamptes being drawn from each of the 20 samples Hypo 
ihetical data for such an experiment are given in Table 11 11 (a) 
Prepare an analysts of variance table similar to Table II 3 Assume a 
random effects model (b) Find point estimates for the variance among 
mixes the variance among samples, and the error variance Find 90 
per cent confidence intervals for the standard deviations among sub 


Table 11 11 


M(x 

Sunder 

Sample wnhn 
Mil Number 

Subsomple 

1 

Obserygiions 

2 3 

I 

1 

$1 

47 

49 


2 

6S 

44 

51 


3 

49 

43 

S3 


4 

3S 

43 

58 

2 

1 

60 

69 

55 


2 

39 

S9 

72 


3 

, 49 

S7 

41 


4 

60 

SI 

54 

3 

--J 

S3 

62 



2 

4S 

64 

S3 


3 

4S 

56 

59 


4 

36 

$6 

55 

4 

I 

S3 

64 

66 


2 

S7 

56 

59 


3 

66 

73 

47 


4 

47 

S9 

54 

5 

1 

62 

75 

59 


2 

60 

65 

67 


3 

84 

63 

72 


4 

76 

72 

56 


samples, samples, and mixes (c) Assuming the data to be for a mixed 
effects model (8), slate and test hypotheses about mixes and samples 
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within mixes, (d) For mix number 2, test at the hve percent level to deter- 
mine if the mean for samples 1 and 2 is significantly different from the 
mean for samples 3 and 4. Make the same test for mix number 5. Assume 
a mixed effects model (S). (e) Assuming the data to be for a mixed 
effects model («), state and test hypotheses about mixes and samples 
within mixes, (f) For a mixed effects model {a), test at the five per cent 
level to determine if the mean for mix 5 is significantly different from 
the mean of all the other mixes. Determine if the mean for mix 4 is sig- 
nificantly different from the mean of mixes 1, 2, and 3. 

11.12. The model equation for sub-subsampling within a one-way classification 
may be written as 

x,)ku = + oCi + Bij + fjijic + eijkuf i 

j 
k 
u 

(a) Write the corresponding observation equation, expressing each 
component part in terms of totals on Xi,Ku- Write the sum of squares 
identity and the analysis of variance table for such a model equation. 

(b) Write the expected mean squares for both the random effects and 
the fixed effects experiment. Find the expected mean squares for the 
mixed model (S), the mixed model {a, S) and the mixed model (a, tj). 

11.13. Prove Eq. (11.46). 

11.14. Derive the expected mean squares for for each of the four models 
shown in Table 1 1 .4. 

11.15. Prove that E(4) = for each of the four models shown in Table 11.4. 

11.16. Prove that the expression given in (11.49) is an unbiased estimator 
of cr^ 4- b(rl. 

11.17. Table 11.12 gives hypothetical data in a nested classification (the reader 


Table 11.12 


Experimenter 

A 

1 


B 



( 

- 


Batch 

1 

2 

I 

2 

3 

1 

2 

3 

4 


21 

19 

23 

15 

29 

32 

35 

33 

26 

1 

17 

13 

32 

34 

26 

45 

37 

42 

40 

Measurement 

19 

23 

21 

23 

36 


44 

46 



14 

5 


15 

25 



42 




13 


26 








can give his own interpretation to the measurements), (a) Prepare an 
analysis of variance, including the error mean squares, similar to Table 
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116 Assume the mixed model (a) (b) Find 9S per cent confidence 
interval estimates of «r’ and (c)Testthehypothesisai = = ac = 0 
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ANALYSIS OF VARIANCE - 
MULTIWAY CLASSIFICATIONS 


Concepts of (he last two chapters are extended, and one*, two-, and three* 
way classifications are compared The uses of single and repeated observa- 
tions tn a two-way classification with fixed, mixed, and random models 
are studied The importance of equal sample sizes is explained, and the 
problem of missing data is discussed The Latin square design is introduced 
and treated briefly Relative efficiency is introduced as an aid in comparing 
designs 

12 1 INTRODUCT/ON 

Anyone working with data knows that in most experimental situations 
more than one factor (variable) affects the outcome of the experiment 
Indeed, more than (wo or three factors often noticeably affect the magnitude 
of the observations However, extensions of topics in analysis of variance to 
many factors may be satisfactorily accomplished through a thorough under- 
standing of the one-, two-, and three-factor designs Hence, we now introduce 
the next most simple design~the randomized block design, a simple two-way 
crossed classification design 

There are many situations in which different treatments change notice 
ably from experimental unit to expexintcnta! unit For example, the amount 
learned by each of three teaching methods (one factor) may vary considerably 
for students of six different intelligence quotient (I Q ) groups (second factor) 
As another example, the water absorbed by activated alumina, measured 
as percentage weight gam, at three different temperatures might vary with 
particle size Further, the responses of different treatments may be affected 
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remarkably by different batches of raw material; the rate of gain of weight 
of animals of the same age may depend on the initial weights of the animals 
as well as on the diets. 

In each of the above examples we think of the observations as being 
classified according to two criteria at once. In each case one variable may 
be considered primary and the other secondary to the purposes of the ex- 
periment. We call the primary variable the treatment factor and the secondary 
variable the block factor. In the first example, the teaching methods represent 
different treatments and the I.Q.’s represent different blocks or experimental 
units. In the second example, the temperatures are treatments, and the parti- 
cle size ranges are blocks. In the last example, the diets are treatments, and 
the weights of animals at the start of the experiment are blocks. 

The simplest randomized block design is one in which each treatment is 
applied exactly once in each block. In the alumina example, suppose that 
the three treatments are 50°, 60°, and 70°F and the four blocks are for particle 
size ranges of 1-2 mesh, 2-4 mesh, 4-8 mesh, and 8-14 mesh. Suppose one 
measurement is made of the percentage of water absorbed by particles in 
the 1-2 mesh range at 50°F, one measurement is made of the percentage of 
water absorbed by particles in the 1-2 mesh range at 60°F, etc. In this way 
12 measurements are made in an experiment which has a randomized block 
design with one observation per cross classification or, for short, per cell. 
The name “randomized block” is used because the treatments for any block 
are applied in random order. Thus, in the block with particle size in the 
4-8 mesh range, the order of application of the temperatures to quantities 
of about the same initial weight could be 70°, 50°, and 60°F. An example 
of the work order (reading from left to right) of the whole experiment is 
shown in Fig. 12.1. 


Order of 
temperature 


Block with 



Fig. 12.1 Layout for an Experiment 

Much of the analysis for the randomized block design with a single ob- 
servation per cell is a simple extension of the analysis for the completely 
randonaized design. Thus, we give an example to illustrate methods of 
estimation and testing before we present the general theory, models, and 
assumptions required for such methods. 

12. 2. AN APPLICATION OF THE RANDOMIZED BLOCK DESIGN 

As an illustration of the nature of the randomized block design as com- 
pared with a completely randomized design, consider the following. Suppose 
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It IS required to determine whether the acidity level of the soil m a garden 
1 $ the same at three different depths, say one, seven and 1 3 in Further, sup- 
pose that four recordings of the acidity level in terms of pW units are to be 
made at each depth of soil Now 12 core samples of soil could be taken at 
random, four at 1 in . four at 7 m , and four at 13 in depth The results of 
such an experiment would then be analyzed according to the one-way clas- 
sification design On the other hand, four core samples could be randomly 
selected, and (hen (he acidity levet of each could he tested at each of the 
three depths of soil, the order being random The data collected in this way 
should be analyzed as a randomized block design 

If the manual labor involved in obtaining the 12 measurements of 
acidity IS assumed to be negligible, which design is preferred ’’ This question 
cannot be answered fully at this lime, but some light can be thrown on the 
problem It can be shown that the paired observations experiment described 
in Sect 8 7 IS a special case of the randomized block design and the in- 
dependent observations expenment desenf^d in Sect S6 is a special case 
of the completely randomized design Thus, the randomized block design 
IS better (worse) than the completely randomized design in situations like 
those where the paired observation experiment is better (worse) than the 
independent observations experiment Since the pH value m the acidity 
experiment is likely to be influenced by the particular location in the garden 
from which the soil sample is taken, it would appear that the randomized 
block design is better than the completely randomized design That is due 
to the fact (hat one pff value is determined for each depth at any location, 
and hence (he effect of the location can be removed m any comparison among 
the treatment means But in the completely randomized design we cannot 
be sure to what extent the location of the sod samples influences our con- 
clusions regarding the treatments Furlher comparisons in the two designs 
will be made later Now we give methods of estimating parameters and 
testing hypotheses relating to a randomized block design A useful notation 
IS introduced in the discussion of methods 

Example 12.1. Data on the acidity level of 12 soil samples in the random 
ized block design described above arc arranged in Table 12 1 for compu- 


Tsble 12 1 

Acidity Level mpffUnin otSoiI 


Core Sample 
Number 

1 Depth of Soil 

1 fxr 

?.\T 

W.ST 

1 


65 

63 

2 

64 

66 

62 

3 

69 

68 

64 

* 

64 

65 

63 
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tational purposes, (a) Use a five per cent level test to determine if the acidity 
level is the same at different depths, (b) Find estimates of treatment means 
and variance components. 

To estimate means and effects and to prepare an analysis of variance 
table, we need the totals, means, and effects along with the notation shown 
in Table 12.2. The sum of squares identity is given by 


Table 12.2 

Acidity Level, in pH Units, of 12 Soil Samples 


Core 

Depth of Soil in Inches 

Core Total 

Core Means 

Sample 

1 

7 

13 

1 


A-ji = 6.5 

Xji = 6.3 

r.i = 19.5 

x.i = 6.5 

2 

■Kl 

Xii = 6.6 

Xy 2 ~ 6.2 

r., = 19.2 

x.i = 6.4 

3 


x-ij = 6.8 

Xjj = 6.4 

7.3 = 20.1 

x.j = 6.7 

4 

Hl| 

Xu ~ 6.5 

Xu = 6.3 

7.4 = 19.2 

x.i = 6.4 

Depth 

totals 

r, =26.4 

Ti. = 26.4 

Tz. = 25.2 

7 = 78.0 

1 


Depth 

means 

= 6.6 

X 2 = 6.6 

-fa. = 6.3 


X = 6.5 

Depth 

effects 

fli = 0.1 

0-2 = 0.1 

S 3 = -0.2 




Total SS = Treatment SS + Block SS + Error SS (12.1) 
where, in our example, the sums of squares are given by 

in 

Treatment 55" = — 

4 

= 507.24 

4 

Block 55 = - n = (19.S)^+ +(19.2)^ _ (78.0)^ _ 

3 12 3 ~T2 

Total SS= a (6.7r + ■■■ + (6.3)^ - = 0.50 

i-l 12 

and, by subtraction 

Error 55 = 0.50 - 0.24 - 0.18 = 0.08 

Associated with the sum of squares identity there is a degrees of freedom 
identity, which is illustrated in Table 12.3. For the present the student may 
determine the degrees of freedom for the treatment, block and total sum 
of squares by the principle already explained and then find the degrees of 


_ r- „ (26.4)^ + (26Ay + (25.2)' (78.0)' 

12 4 ~12~ 

- 507.00 = 0.24 
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freedom for error by subtraction A justification for the degrees of freedom 
for the error sum of squares will be given later In our example, the treatment 
effects cCi arc fixed, and the blocks are randomly selected from a population 
of blocks with variance trj Further, we assume that the single observation 
in each cell is randomly and independently drawn from a population with 
variance = <r* That is, each of the twelve populations has the same 
variance If we assume the effects at to be defined so that 

It can be shown by the methods of Oiap 11 that the expected values for 
the mean squares are those given in Table 12 3 


Table I2J 

Analysis of Variance for Soil Acidity 


Source of 
Vanailon 

Sam of 
Squarti 

Degreetof 

1 Freedom 

Mean Square 

Expected Mean Squam 

Blocks 

0 IS 

\ 3 


e* + 34 

Treatmenu 

0 24 

2 

0t2 = J? 


Error 

OOS 


0 04/3 -4 


Toul 

oso 

tl 




Letting the treatment means for depths of 1, 7 and 13 in be p, , > 

and jij . respectively, we wish to test the hypothesis 

^0 Ml =* fii = Jii = M (or Of, = or, = a, = 0 or 2 
where fx is the over all mean The alternative hypothesis is that at least one 
of the treatment means is different from the over*alI population mean or, 
for short 2 ®i ^ 0 If in addition to the assumptions already made, we 
assume that the 12 populations for cells are normally and independently 
distributed then it can be shown that the ratio s\lsl is distributed as F with 
two and six degrees of freedom when the null hypothesis is true When the 
alternative hypothesis is true the numerator s|, of the ratio ij/sj on the 
average is larger than o’ Hence the critical region for the test of (12 2) 
IS made up of all those values of F for which F > F^{2, 6) = 5 14 Since 
the particular experimental ratio 



T 


IS greater than 5 14, we reject the hypothesis (12 2) and conclude that the 
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treatment means are different. (Actually, on looking at Table 12.2, we would 
say that the acidity level at 13 in. is less than the average acidity level at one 
and seven in. 

■ Unbiased point estimates of the treatment means and effects, 
a, = jCi — X, are shown in Table 12.2. We are not interested in the core mean 
estimates. However, it is useful to know the unbiased point estimator si 
of the core variance component given by 


4 = 



0.06 - 


0.04 


= 0.0156 


Further, an interval estimate of means may be found by the method explained 
in Example 10. 1, and an approximate interval estimate of (r| may be found 
by the methods given in Sect. 11.2. 

To test the hypothesis that a-% = 0 against the alternative hypothesis that 
o-| > 0, compute sf/4 and compare with the upper a level F value with three 
and six degrees of freedom. If sj/sl is greater than this value, reject the hy- 
pothesis 0-1 = 0 and conclude that the block component of variance exists; 
otherwise, fail to reject the hypothesis. 


72.3. TWO-WAY CLASSIFICATIONS. SINGLE OBSERVATION PER CELL- 
FIXED MODEL 

Think of x,ju as being the wth (« = 1, . . . , n) random observation in the 
ith (i = I, . . . , c) group of factor one and the jth {j = ,r) group of 

factor two. Assume that is distributed with mean fiij and variance 
o-y = o-“; that is, each of the cr populations have the same variance. Thus, 
if Siju denotes the amount the random observation deviates from its 
mean iit^, we may write the model equation 

Xiju = liii + etyu (/ = 1, . . . , c; y = 1, . , . , r; u = !,...,«) (12.3) 

In the special case where « = 1, that is, where there is a single random ob- 
servation per cell, we drop the third subscript and write 

Xi, = Mu + (' = h • • • , c; j = 1, . . . , r) (12.4) 

understanding that fitj is a fixed parameter and that e,j is a single random 
deviate about the mean fii,. 

In Example 12.1 we arranged the data in a rectangular table for ease 
of computation. Such a table is useful in understanding the general two-way 
classification. Use c columns to represent the c groups of factor one (treat- 
ments) and r rows to represent the r groups of factor two (blocks). Then the 
means of the cr populations along with c column means and r row means 
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may be arranged as in Table 124 The iih column mean n, and yih roM 
mean n , are defined by 


T«b)e 124 

Population Means in a Two-Way Classification 


Row 

Number 

1 Column Numbtrt 1 

1 1 e 

Row 

Means 

^ Raw Effeds 

1 

^ Pll All Pci 

M, 

' fl, *>*1-11 

J 

i PlJ P»> 

P 1 

1 4> * P i - P 

1 

' 1 

A|, Mee 


® P r — P 

Column 1 
means 

,, « 

. ! 


Column , 
effects 1 

1 ! 

! 

1 


and 






0 = 1 . .') 

respectively, and the over all mean u of the cr populations by 


(12 5 ) 


02 6) 


Further 

and 




(12 7) 

a, = - M 

(1 = 1. 

.c) 

(12 8) 

&> = ft, -ft 

(/=!. 

>r) 

(12 9) 


denote the deviations of the column and row means from the over-all mean, 
and they are called the column and row ejects respectively Thus, the cell 
mean may be written as 

^ ft) + l/i j ~ fi) -i- (fi,, ~ l^i ~ fi,+ ft) (12 10) 
or, in the important particular case, as 




SECT. 12.3. 


ANALYSIS OF VARIANCE MULTIWAY CLASSIFICATIONS 


401 


/lij = /L + (Yi + /Sj >(/= 1, (12.11) 

when 

Mu - Mi. - Mv + M = 0 (12.12) 

In case Eq. (12.12) holds, we see that the cell mean fiij can be expressed as a 
constant m plus a column effect at which is the same for all cells in column 
i plus a row effect which is the same for all cells in row J. This is made 
clear in Table 12.5. Further, we see from Eq. (12.11) that the cr parameters 
ill, can be expressed in terms of 1 + c + r parameters [i; a,, . . . , ad 

• • • ) ^r- 


Table 12.5 

Population Means in Terms of Effects 


Row 

Number 

1 

Column Number 

. . / 

c 

Row Means 

1 

/A -f «! + 

Mti = 

^ -f- 4 ^1 

Mcl = 

M + O'j. + 

/!., = M + ^1 

/ 

i = 

/4 4- 

. . '*0= . . 
M + 

. Mc.1 = 

M + ®c + 

Mu = M + 

1 

r 

f‘lr = 

M + «! + I3r 

^ 4 4 

Mcr = 

H + 


Column 

means 

Ml. = I* + «! ■ 

• • Ml. = M + O', • • 

• M 

1 

M 


Because of the way and are defined, we obtain the following two 
linear restrictions involving the newly introduced parameters 


C 


2 


at — 0 


and 




(12.13) 


That is, only (c — 1) Qr,’s and (r — 1) /J/s are independent. Thus, 
the cr parameters iii, in Eq. (12.11) can actually be expressed in 
terms ofl + (c— l) + (r— l)=ic + r— 1 independent parameters Ml 

0^1, • • • , <Yc_i , ^1, . . . , /Sr_i. 

Substituting Eq. (12.11) in Eq. (12.4) gives the model equation 


Xij ^ ii + ai + + (i= l,...,c;y= I,...,r) (12.14) 

This is the form we use most of the time, since our main concern is in making 
statements about either the column effects or the row effects or both. For 
example, typical hypotheses tested by analysis of variance are 
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ft fll = 

= Oft s= 0 (or Ml = 

• =p, =p) 

(12 15) 

ft ft« 

= = 0 (or p 1 = 

p , = p) 

(12 16) 


The model equation for the two-way classification given by Eq (12 14) is 
said to be addiine This name is used because the mean of any cell is obtained 
by adding a raw effect to a cotamn effect to an over-alf (constant) effect 
Now consider the tr populations with means Mu which are shown in 
Table 12 4 Select at random one observation from each population, and let 
each selection be independent of all others Arrange these observations as 
shown in Table 12 6 and compute the means indicated in the margin That 
IS, compute the means Siy S „ and St by the formulas 


Xt * 


22 ^ 1 , 


(1217) 


respectively The estimator effects of the columns and rows are given by 
a, = X, -S and b,^S,-S (12 18) 

respectively 


Talile 126 

Two-way Cta»t5caliOQ with One Observation per Cell 


Row 

Number 

, 

Column NunAen 

' Row 
Meani 

1 

Row Effects 

I 

>11 


X 1 

6, = 

/ 

1 



Jt , 

b, = S,~X 

r 

Xir 

XtT 


b,=X,-S 

Column 

*1 

X x< 

i 


Column 1 
effects ! 


-X 0,-Si Ot it -X 




It IS easy to show that x,x,,xj, a„ and b, are unbiased estimators of 
the parameters p, p, , jt j a„ and ft, respectively Further, if we let 
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Xij = X + ai + bj (12.19) 

it follows that x^ is an unbiased estimator of the cell mean /ijj, since 

E(xij') = E(x + flj + hj) — + Ui + — fJ-i) 

It should be noted that the unbiased property follows, in each case, from the 
definitions of the model equation (12.14) and the assumptions of random 
observations. (We do not need the assumptions of equal variances and normal 
populations.) 

If e,j denotes the amount a random observation Xij deviates from cell 
mean Xu, then we may write the observation equation as 

Xi, = Xii + e,j (i = 1, . . . , c; ;■ = I, . . . , r) (12.20) 

or, substituting Eq. (12.19) in Eq. (12.20), as 

Xij = X + fl, + bj + et, (i = 1, . . . , c; ;■ = 1, . . . , r) (12.21) 

Clearly, etj is an unbiased estimator of e,j. 

In Sect. 10.3 we developed two sum of squares identities for the one-way 
classification. The first identity involved parameters of the model, and the 
second identity did not. (We learned in Sect. 10.3 the advantages of the 
second identity.) Now we could develop similar sum of squares identities 
for the two-way classification, but, since we wish to explain test procedures 
for the hypotheses in (12.15) and (12.16), we discuss only the second type 
of identity. The derivation of the first sum of squares identity is left to the 
student as an exercise. 

We can partition the deviate of the observation value Xtj about the 
over-all mean x into the sum of three deviates as follows 

Xt} - X = (Xi. - x) + (x.j - x) + (X(; - X,, - x.j -t- x) (12.22) 

Squaring both sides of Eq. (12.22) and summing over-all observations gives 

2 2 = 2 2 + 22 - ^y 

i J i J I J 

+ 2 2 - X.) + xf 

t J 

+ 2 2 2 - x) (12.23) 

i J ' 

+ 2 2i X)(X(j X( — X.j -1- x) 

+ 2 2i 2< (.X.J x)(Xjj X(, — x.j -h x) 

Since the last three terms on the right-hand side of Eq. (12.23) are zero, 
we may write the desired sum of squares identity as 
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SST = SSC + SSR +SSE (12 24) 

where 

ssr = X 2 - X)\ 5SC = X 2 -xy = r'^al 

= 0225) 

SSE= + SX'i 

Note that SST' denotes “total sum of squares,'* SSC “sum of squares for 
column means,” SSR “sum of squares for row means,” and SS£ “sum of 
squares for error (or residual or remainder) ” (The word “means” could 
be replaced by the w ord “effects” in the definition of SSC and SSR ) As an 
example of how the three terms on the right-hand side of Eq (12 23) reduce 
to zero, we use Eq (12 J7> to wnte 

2 2 -X. -jfj + X) 

“SW. -•*, + *) 

= 2 " ^)(2 - 2 - 2 ^ + 2 

= 2 W' - -rit rl) 

= 2 W - *)(<|) = 0 


In Example 12 1 the sums of squares were compmed by using totals rather 
than means Now, if we denote the totals by 

r. 2’»=2*u 2'=2 2'-J (*2 26) 

so that the means in Eq (12 17) may be written as 


: and x = - 


It IS easy to show that the sums of squares in Eq (12 25) reduce to 

fsS2-=2 2a:„-^ 

Se! „ 


SSC^rZ(^i ~x}': 


SSR=e'2iX,-Xy = 


27"> 


, 2n XT’, 

l^SS£=X2^..--i-j 


(12 27) 
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Note that the divisor in each quotient in Eq, (12.27) is the number of obser- 
vations used to obtain the total in the dividend. Also, note that SSE is 
usually found by subtraction as follows 


SSE = SST - SSC - SSR 

The expression (12.24) is an algebraic identity. It is not dependent upon 
any assumption relating to the model. That is, given any rectangular array 
of numbers, we can partition the sum of squares of the deviates of the 
observations about the over-all mean into three component sum of squares 
as we did in (12.24). Of course, we might not be able to give any meaningful 
interpretation to such a partition in many problems, but the partition can 
still be made. 

If we assume that the additive model equation (12.14) has fixed effects 
cTi (/ = 1, . . . , f ) and /Sj (y = 1, . . . , r) and random effects which are 
independently distributed with common variance tr- and zero means, then 
it can be shown by the methods of Sect. 11.3 that the expectations of the 
sum of squares in Eq. (12.27) are 


/ E(SSC) = (c - 1)0-' + r 2 a? 

1 

E(SSIt) = (r - l>r= -t- c 2 

E(SSE) = (c - l)(r - 1)0-5 

£(5-57) = (cr - l)<r5 -h r 2 + c 2 /55 

I } 


(12.28) 


From the partition theory of the distribution of Chap. 10 it can be shown 
that the three sum of squares on the right-hand side of Eq. (12.24) are 
independently distributed. If, in addition to the assumptions listed at the 
beginning of this paragraph, we assume that the cr populations are normally 
distributed, then the ratios 


r S SC 


are independently distributed as chi-squares with c — 1, r — 1, and (c — 1) 
(r - 1) degrees of freedom, respectively. Further, note that if the null 
hypotheses (12.15) and (12.16) hold, that is, if 

2 = 2 0 

‘ j 


a-- + 




SSE 


\ cr- 
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then the ratios 


jf = 
4 = 


ssc 

c- 1 
SSR 


SSE 

«• - 1)(F - 1) 


(12 30) 


are independent unbiased estimators of We bring together in Table 12 7 
the analysis of variance and expected mean squares for a fixed model ex- 
periment in a two-way classification design with one observation per cell 
Again, we see that the degrees of freedom identity 


„ - 1 = (c - 1) + (r - 1) (c - I)(r - 1) (12 31) 


can be used with the sum of squares identity to obtain the appropriate mean 
squares in the analysis of variance 


T*N* 117 

Analysis of Variance for Two-Way Classification 




D*tf***ef 

Frttdom 


Exptettd Mtan Square 
for Flitd Model 

Column 

(ITecis 


t-\ 

j* 

-’AU) 

Row 


,-i 

4 


Enor 

by subtraction 

(c-l)(r-l) 

4 


Total 






If the null hypothesis (12 15) that the column effects are equal to zero 
holds the ratio jJ/il has the F distnbution with c — 1 and (c — !)(/• - 1) 
degrees of freedom Also, if the null hypothesis (12 J6) that the row effects 
are equal to zero holds, the ratio s*/^ has the F distribution with r - 1 
and (c — l)(r - 1) degrees of freedom Thus, the F distribution may be 
used in theusual way to test thehypotheses(12 15) and (12 16) ifEq (12 14) 
IS a fixed effects model equation with random variables c,i, , e,, normally 
and independently distributed with common means of zero and common 
vauances of o’ 

Confidence intervals for any linear combination of row means or any 
linear combination of column means may be found by the method of Sect 
106 The error variance estimators* with (c — l)(r — 1 ) degrees of freedom 
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obtained in the analysis of variance is used for all confidence problems. 
Thus, if 

7(. = nJi/Xi. 4- • • • + ntclic. (12.32) 


is any linear combination of true column means and 
Cc = ntiXu + • ■ • + mcXc. 

is the corresponding linear combination of estimated column means, then 
the statistic 

Cc - Tc 

^ 2 ' (12.33) 

r 

has the t distribution with (c - l)(r - 1) degrees of freedom. Thus, the 
100 (1 — d) per cent confidence limits for Eq. (12.32) are given by . 


Cc ± tan 



(12.34) 


where Un is the upper all percentage point of the t distribution with 
(c — l)(r — 1) degrees of freedom. The expression in (12.34) may be used 
to find confidence limits of single means lit, or differences in means 
fit — /X,; where V i. Likewise, the 100 (1 — a) per cent confidence limits 
of any linear combination of row means 


are given by 


7r = ffj'i/i.i + • • • + m'rll.r 


2 


1=1 


m'jx,) ± tan 



(12.35) 


where tan is defined as in (12.34). The reader should remember that (12.34) 
and (12.35) give valid confidence limits when Eq. (12.14) is a fixed effects 
model equation with random variables eu, , . . , Ccr normally and independ- 
ently distributed with common means of zero and common variances of <r'. 

The methods of Sect. 10.8 for obtaining simultaneous confidence intervals 
(or for testing simultaneous hypotheses) may be used with either the column 
means or the row means, provided that the error variance si with 
(c — !)(/• — 1) degrees of freedom replaces with k(n — 1) degrees of 
freedom, and that the error variance for the means is obtained by dividing 
^ by the number of observations used in computing an individual mean. 
Also, the multiple test procedures for means described in Sect. 10.9 may be 
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applied, provided that ij of the twro-way classification replaces s’ of the 
one-way classification (The reader will not have any trouble with appli- 
cations of these methods if he thinks of doing two one-way classification 
problems in which the same error term ij is used and the divisor depends 
on the means) Usually, simultaneous or multiple methods are applied 
either to the column means or to the row means, but not to both 

Power of the tests of hypotheses (12 15) and (12 16) may be obtained by 
the methods of Sect 104 Also. Table Vlll may be used to find power and 
the size of the type 2 error in a two-way classification 

The reader has probably wondered why the randomized block design 
was discussed in Example 12 I and then was followed with a theoretical 
development of the tKO-way classtficaiion design Actually, the randomized 
block design is a special case of the two-way classification design, and the 
term randomized block is used when one category of classification is of 
primary importance and the other is of secondary importance However, 
m some experiments both factors may be of equal importance, m which 
case both factors are analyzed In this case we may think of the experiment 
as a lv>o-factor factorial experiment Kirt no interociion Factorial experi- 
ments are discussed m Chap 13 

12 4 US6 O? EPfECTS IN UNOWSTANOINC ANAIYSIS Of VARIANCE 

In Example 12 1 of Sect 12 2 we used the raw data and totals to prepare 
an analysts of variance table, and, in general, we discussed the problem in 
terms of means The method given is perhaps clear but a better under- 
standing of the analysis may be obtained by using a more lengthy analysis 
in terms of effects (This section may be omitted without destroying the 
continuity of the subject ) 

We first note that the block and treatment sum of squares of Table 12 3 
may be found directly from the estimated effects and means in Table 12 2 
Using Eq (12 26), we find that the treatment and block sum of squares are, 
respectively 

SSC = r 2 o' = 4[(0 !)• + (0 !)• -I- (-0 2)*) = 0 24 
and 

S5P= 3[(65 -65)*-f-'(64 - 6 5)* + (6 7 - 6 5)’ 

-I- (64 - 65)'] = 0 18 

These are shortcut formulas for finding the sum of squares of the estimated 
treatment and block effects of each observation m the analysis 

The residua! (error) sum of squares may be obtained by computing the 
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sum of squares of all the observations after the differences in row and column 
means have been removed, that is, after the observations have been adjusted 
so that the column means are the same and all the row means are the same. 
First, we change the observations in Table 12.2 so that all row (block) 
means are the same, and are equal to the grand mean x = 6.5, Thus, we leave 
the observations in the first row alone, add 0.1 to each value in the second 
row, subtract 0.2 from each value in the third row, and add 0.1 to each value 
in the fourth row. The resulting observations are recorded in Table 12.8 
along with the new totals and adjusted row means. Note that the column 


Table 12.8 

Acidity Level in Table 12.2 Adjusted so that Row Means Are Constant 


Core 

1 Depth of Soil in 

Inches 

Totals 1 

Adjusted 

Sample 

1 

7 

13 

Means 

1 

6.7 

6.5 

6.3 


6.5 

2 

6.5 

6.7 

6.3 


6.5 

3 

6.7 , 

6.6 

6.2 

19.5 

6.5 

4 

6.5 

6.6 

6.4 

19.5 

6.5 

Totals 

26.4 

26.4 

25.2 

78.0 


Means 

6.6 

6.6 

6.3 


6.5 


and grand means remain unchanged. That is, the block effect can be removed 
without changing the treatment means and grand mean. Now the variance 
of the adjusted sample means for the blocks is zero, but the variance of the 
treatment means remains the same. 


Table 12.9 

Acidity Level Adjusted so that Row and Column Means Are Constant 


Core 

Depth of Soil in Inches 

Totals 

Adjusted 

Sample 

1 

7 

13 

Means 

1 

6.6 

6.4 

6.5 

19.5 

6.5 

2 

6.4 

6.6 

6.5 

19.5 

6.5 

3 

6.6 

6.5 

6.4 

19.5 

6.5 

4 

6.4 

6.5 

6.6 

19.5 

6.5 

Totals 

26.0 

26.0 

26.0 

78.0 


Adjusted 

means 

6.5 

6.5 

6.5 

1 

i 

_1 

6.5 


In a similar manner, change the values in Table 12.8 so that the column 
means, as well as the row and over-all means, are all equal. Thus, subtracting 
0.1 from each value in the first two columns and adding 0.2 to each value 
in the third column gives the values recorded in Table 12.9. The remaining 
variation of the values in Table 12.9 is not due to differences in row or column 
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means, sjnee aj] these means arc now the same It is clearly due to experi 
mental enors (random sampling) 

In order to discuss the experimental vanations in terms of deviates, 
etj, we subtract 6 S from each value in Table 129 to obtain Table 12 10 
Now, the residual (error) sum of squares is obtained directly from Table 
12 10 as 


SS£= 22e:j = (01)* + (-01)*+ •• +(0 1)’ = 008 
This is the value found in Table 12 3 


Table 12 tO 

Residuals for the Acidity Level Eiperunent 


Cere 

Depth of Sail in 

Inthee 

Totals 

Adjusted 

Sample 

J 

7 

li 

Means 

1 

01 

-01 

00 

00 

00 

2 

-01 

01 

00 

00 

00 

3 

01 

00 

-01 

00 

00 

4 

-01 

00 

01 

00 

00 

Totab 

00 

00 

00 

00 


Adjusted 

ffleus 

00 

00 

00 

00 

00 


Something which is easily verified for the general case is obvious m this 
problem, namely that 


* 01+01 +(- 02 )= 0 
2*1 = 00 + (~0 1 ) + 02 + (-0 1 ) = 0 
2'm = 0 0 = >. .<) and 

2^11 = 0 <(=1.2,3) 


(12 36) 


Thus, there is one linear restriction among the a’s, one linear restriction 
among the A's, and 4 + 3 (m general, r + c) linear restrictions among the 
e's Only, 4 + 3 - 1 (in general, r + c — I) of the linear restrictions among 
the e’s are independent That is, if six of the restrictions among the e’s are 
given, the seventh follows 

In determining the divisor of SSE so as to make the resulting mean square 
an unbiased estimator of cr’, we may argue as follows The 12 residual values 








SECT, 12.5. 


ANALYSIS OF VARIANCE MULTIWAY CLASSIFICATIONS 


411 


in Table 12.10 have been corrected for three column means which must 
average 6 5, for four row means which must average 6.5, and for one over- 
all mean which is 6.5. The degrees of freedom for the sum of squares SSE 
is thus 12 - [(3 - 1 ) + (4 - 1 ) d-l] = 6 {in general, cr - [(c - 1 ) + 
(r _ 1 ) + 1] = (c - l)(r - 1)}. Thus, due to the additive nature of our 
model, the randomness assumption, and the equal variance assumption, it 
is possible to find an unbiased estimator of the common variance o-* 
when only one observation is made on each population; this is a most 
important property. 

The "residuals may be found directly from 


Cij — Xij Xij 


(12.37) 


Substituting the estimated effects x, a,, Oj, 03 , 61 , bs, 64 in Formula ((12.9), 
we obtain the estimated cell means shown in Table 12.11. Then the means 
in Table 12.11 may be subtracted term by term from the observations in 
Table 12.2 to give the residuals of Table 12.10. It should be noted that we 
may also write 


eu ~ Xi) - X - Qi ~ bj 


or 

Cn = X 13 - Xi, — ic,) + X (12.38) 


Also, note that the unbiased estimators of the cell means shown in Table 
12.11 may be obtained even though a single random observation is made 
in each population. 


Table 12.11 

Estimates of Population Means for the Acidity Experiment 


Core 

Sample 

j Depth of Soil in Inches 

1 

; 7 

13 

I 

6.6 = 6.5 + 0.1 + 0.0 

6.6 = 6.5 + 0.1 + 0.0 

6.3 = 6.5 - 0.2 + 0.0 

2 

6.5 = 6.5 + 0.1 - O.I 

6.5 = 6.5 + 0.1 - 0.1 

6.2 = 6.5 - 0.2 - 0.1 

3 

6.8 = 6.5 + 0.1 4- 0.2 

6.8 = 6.5 + 0.1 + 0.2 

6.5 = 6.5 - 0.2 + 0.2 

4 

6.5 = 6.5 + 0.1 - 0.1 

6.5 = 6.5 + 0.1 - 0.1 

6.2 = 6.5 - 0.2 - 0.1 


12.5. CALCULATION OF THE STANDARD ERROR FOR A COMPARISON OF 
MEANS 

At the end of Sect. 12,3 it was stated that the error mean square found 
in the analysis of variance of the two-way classification could be used to 
establish confidence intervals for and test hypotheses about linear combi- 
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nations of means Jf the linear combination is a contrast, a special standard 
error which is appropriate for the particular contrast may be computed 
Such a standard error should be applied when the variances for treatments 
are definitely not equal To illustrate the techniques we analyze the data 
in the following example. 

Example 12.2. In an experiment (hypothetical) each of four observers 
made counts of bacteria m milk on each of six plates The plates were 
placed in a fixed position along a table illuminated by daylight so that each 
plate would be counted under nearly the same conditions The observers 
moved from plate to plate in a random order The bacterial counts along 
with totals and means are recorded in Table 12 12 


Tabic 12 II 

Bacicrnl Counts m Milk 



We Wish to compare the observers in their ability to count bacteria 
Suppose i( IS known that M k a regular counter and that B, C, and U an 
new counters. Further, suppose that Dm a woman and B and C are boys 
Then, three hypotheses of interest, stated m terms of orthogonal com- 
parisons. are as follows 

/ H,i 

]//« -ft. - Mr + 2ft„ = 0 (12.39) 

1 Ihi - /It = 0 

Test the three hypotheses in (1239). using the standard error appropriate 
to each 

First, we construct the usual analysis of variance table (Table 12 13) for 
a fixed effects randomized blocks design Then we use Eq (10 68) to compute 
the following three components of treatment sum of squares with an indi- 
vidual degree of freedom 
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2 _ (2 _ [3(1410) - 1(1110) - 1(1050) - I(1230)P 

6 2'”? 6[3-’ + (-1)= + (-!)= + (-1)^1 


840 ’ 

6-12 

300’ 


9800 = “A vs. B, C, and Z)” 55 


Ql = ^ = 2500 
0 = 1 ^ = 300 

Note that the treatment SS = Qi + Ql + Ql; that is 
9800 + 2500 + 300 = 12,600 


Table 12.13 

Preliminary Analysis of Variance for Data of Table 12.12 


Source of 
Variation 

Sum of 
Squares 

Degrees of 
Freedom 

Mean 
[ Square 

1 

Fc 

F.osO, 15) 

F.o, (3,15) 

Plates 

173,800 

5 





Observer 

12,600 

3 


24.14** 

3.59 

5.42 

Error 

2,620 

15 



/ 


Total 

189,020 

23 

j 

i 




*• denotes “highly significant.” This means that the usual hypothesis of equality 
of observer means is rejected at the one per cent level. 


Next, in order to partition the error sum of squares, use the coefficients 
(see Table 12.14) which define the three linear comparisons. Then compute 


Table 12.14 

Coefficients of Three Orthogonal Comparisons . 


Comparison 

Coefficients for Observer 

Sum of 
Squares 

A 

B 

C 

D 

1 

3 

-1 

-1 

-1 

12 

2 ' 

0 

-1 

-1 

2 

6 

3 

0 

1 

-1 

0 

2 


the three comparison totals (see Table 12.15) within each of the six blocks 
in the same way as the numerator total for each Q’. For example, find for 
the first comparison in plate 1 that 

3(340) - 1(250) - 1(248) - 1(282) = 240 
and for the third comparison in plate 6 that 
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i\4 


l(7J) - 1(64) « 9 


Tabic 12 IS 

Comparison Totab for Plates m Table 12.12 


Plate 

Number P | 


1 240 

2 156 

3 116 

4 IDS 

5 S4 

6 136 


Totab 


840 


Comparison 


66 

63 

98 

6 

6 

61 


300 


2 

27 

-10 

10 

22 

9 


60 


Note that the totals m Table 12 IS arc the same as the totals in the numerators 
of the Q’ s— this serves as a check 

Now, the experimental error sum of squares associated with the first 
comparison is given by 

240* + 156* + + 136* - ^ 

^ ^ = 1250 67 

where the divisor 12 is the sum of squares of the coefficients of the compari- 
son In a similar way, we find the error sum of squares associated with the 
second and third comparisons to be 112033 and 449 00, respectively The 
sum of these three components is equal to the error sum of squares, since 


Table 12.16 

Complece Analysts of Vuiance forihe Hypotheses in (12 39) 


Source of 
Variation 

Squares 

Degrees 

of 

Freedom 


Expected Mean Square for Fixed 

Model {Equal Variances) 

Plates 

173,800 

5 

34,760 

e* + 4 2 e!/5 

Observers 

12,600 

3 



Comparison 1 

9, SCO 

1 

9,800 

^ e’ + 36(3*, «4)V72 

Comparison 2 

2,500 

I 

2,500 

»* + 36(-c,-«, + 2«4V/36 

Comparison 3 

300 

1 

300 

e* + 36(a, - «,)V12 

Error for 

2,620 

IS 

174 


Comparison 1 


1 * 

250 

e' 

Comparison 2 

1,120 

5 

224 

e’ 

Comparison 3 

449 

1 5 

90 


Total 

189,020 

1 ^ 
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the comparisons are orthogonal. The number of degrees of freedom asso- 
ciated with each error component is five, and the complete analysis of 
variance, along with the expected mean squares, is shown in Table 12.16. 

It is clear from Table 12.16 that the F distribution with one and five de- 
grees of freedom should be used in testing each of the hypotheses in (12.39). 
The five per cent and one per cent upper F values are Fos = 6.61 and 
F.o, = 16.26. For the first comparison the computed F is = 39.2. 
Since 39.2 > 16.26, comparison one is highly significant, and we conclude 
that observer A counts significantly more bacteria than the average of the 
other three observers. Since the computed F for the second and third com- 
parisons are = 1 1.2 and = 3.3, respectively, we say that comparison 
two is significant and that comparison three is not significant. Hence, we 
conclude that the new woman counts significantly more bacteria than 
the two boys on the average, but the two boys do not differ significantly in 
their counts. Note that had the error mean square of 174 with 15 degrees of 
freedom been used in each case, the conclusions would be very much the 
same, except comparison two would be declared highly significant rather 
than significant. 

The tests in the last paragraph were made with Table 12.16 as a guide, 
even though the expected mean squares were computed on the assumption 
that the variances were equal. It turns out that the same tests are appro- 
priate when the cr populations have variances which are not all equal. For, 
if we let the variance of the population in the /th column and yth row be 
— 1, . . . , c‘,j = 1, . . . , r), it can be shown, under the assumption 
that the null hypothesis is true, that the expected values of the two mean 
squares used in the F test of any linear comparison of means are the same 
linear combinations of the variances <r|,. For example, for comparison two 
it follows that 


1 j 

- (6/i + 6aj 4- 2 + 2 f 3j) 

j j 

+ 2(6/i -h 6ar^ 4 - 2 -Sy + 2 ^45)]“} 

J } 

= - a, 4- 2 at ) + (-2 ^2, - 2 ^32 + 2 2 

J i } 

= - a, + la,Y 4 - 2 + 2 + 4 2 O 


and 
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£(error mean square for comparison 2) 

|(-»„-I., + 2l„)’+ ■ +(->.-».+ -T- + )' j 

5[(-l)'' + (-17+2*1 I 

I = — o, + 2a^ + (— «ii — <»i + 2f4i)]‘ + 

((— otj — ttj +-2014) + (— «ii — e»» + 2«i()]’] 

— £{(-r, -n +2r4)*i) 

= T*Tf {^K-a* ~ ci, + 2a,)* + +<ri + 4rt-5, + + (-a, 

— flj + 2«,)’ + <^ + ffjj + 4<Ty 

— (6*(— ofj — a, + 2a.)* + 2 (®’«i + *^ij + ^o)]} 

2 (‘'u + 

= 

Thus, when the null hypothesis -#i, - *it + 2/ii =* 0 is true, we have 
£(Gj) = £(erfor mean square for comparison 2) = c\ 

where 

. + ffl, + 4.7!, 

36 

In general, for any comparison ^ 

y =s m,fi, + + 

the component of the treatment mean square is 

Y 

= L 

r2"! 

and the associated error mean square is 

i(i (i".2-.)' 

' ±^: 'rxm <'2«) 

^ yzrj = 

It IS easy to show, by the methods of Sect 11 3, that 

)= '■'(2 '”<<*«)+ 2 (1241) 
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and 


^^[2(2 "^2 2 

Thus, it follows that 


(12.42) 


£((20 = 


22 

t j (. ^ ^ 


r 2 


2 


(12.43) 


and 


If 7 = 0, then 


2 2 

" ‘r2w! ■ 


2 t»ia, ~ 0 


(12.44) 


and QV4 ^ distribution with 1 and r — 1 degrees of freedom, 

provided the eij in the fixed model equation (12.14) are normally and 
independently distributed. 

In practice, the method of partitioning the error sum of squares should 
not be applied unless the population variances 0 *?^ can be assumed to be 
quite different. A small degree of heterogeneity among the variances <t^j 
does not usually noticeably disturb a test in which the regular error mean 
square is used. Thus, when one is deciding whether to use the regular error 
mean square in preference to a component of the error mean square, it should 
be realized that a sizeable loss in degrees of freedom in the erroF term is 
probably worse than small heterogeneity among population variances. 


12.6. MISSING DATA IN A TWO-WAY CLASSIFICATION DESIGN, SINGLE 
OBSERVATION 

Sometimes the original design of an experiment is destroyed because 
one or more of the observations is missing or unreliable, due to accident. 
Fortunately, it is possible to describe methods which give adequate (either 
fairly short approximations or longer exact) analyses of experiments with 
missing data. In the procedure we describe, the available observations are 
used to find estimates (dummy variables) of the missing observations so that 
the regular analysis can be performed on cr values in the resulting augmented 
table. Allen and Wishart [1] first presented the method and Yates [40,41] 
and others [4, 5, 18] developed it. Now. using the data of Example 12.1. 
we explain the method for the case where only one value is missing. 



418 


ANALYSIS OF VARIANCE MULtIWAY CLASSinCATlONS 


CHAR 12 


Example 123 Illustrate the procedurefor missing values if it is supposed 
that the observation m the first column and third row is missing from Table 
12 1 Assume that the data are now for a fixed model 

Denoting the missing observation byy, we compute the totals and means 
shown in Table 12 17 It is reasonable to suppose that y should be a value 
with error component eo equal to aero This selection of y is an unbiased 
estimate of the cell mean, and, furthermore, the error sum of squares of the 
available data is not affected by the addition of this dummy value of y 


Table 12.17 

Data of Table 12 I withy Substituted for Missing Value 


Number 

1 Depth of Soil in 

1 ^ 

11 

Totals 1 

Means 

1 

1 C7 

6$ 

63 

19 5 

6S 

2 

64 

66 

62 

t9 2 

64 

3 

y 

68 

64 

l32 + >- 

(I3 2+>)/3 

4 

64 

65 

63 

19 2 

6.4 



■a 

252 

711 +y 




■9 

63 


(71 1 +>)/f2 


Thus, we write 

y = « + o, + h, 
or 

y = f + (f, - JO + (^ a - i) “ ^1 + ^ ^ (12 45) 

Now, substituting the values ofi',.yj and i! from Table 12 17 in Eo (12 45) 
gives 

_ 195 + y ■ 132 + y 71 1 + > 

^ 4-^3 12— 

or 

y = 67 

This value ofy may now be substituted m Table 12 17 and the estimates of 
means and effects computed in the usual way Also, this augmented table 
may be used to compute, in the customary manner, the components SSC, 
SSR SST, and SSE in the sum of squares identity 

la order for tht error roeaa ta be. w. 'wfea.se'i vJ.'malit - 2 ^ wie 

divide the error sum of squares by (3 — I) (4 — 1) — l = 5, smcc only 
five of the e,/s are now independent, because of the additional independent 
restriction that e,, = 0 Now to test the null hypothesis a, = a, = a, = 0 
against the alternative hypothesis 
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we could use the augmented data and apply the regular analysis of variance 
with the degrees of freedom associated with the error term reduced by one. 
The results of such an analysis are shown in Table 12.18, and the null hy- 
pothesis is rejected at the five per cent level. But the test indicated is an 
approximate test, since the expectation of the treatment mean square as 
calculated from the augmented data is greater than when the null hy- 
pothesis is true. Thus, the test tends to reject a true null hypothesis too 
often. (In practice, if this test fails to reject the null hypothesis, there is 
no need to apply an exact test.) If the number of degrees of freedom for the 
error mean square is large, the approximate test just described might be 
considered satisfactory in most cases. But for a small number of degrees of 
freedom the test described next should be applied. 


Table 12.18 

Approximate Test for Ho', a, = irj = = 0 when One Observation is Missing 


Source of 
Variation 

Sum of 
Squares 

Degrees of 
Freedom 

Mean 

Square 

Tc 

fos 

Blocks (cores) 

0.110 

3 

■nn 



Columns (depths) 

0.207 

2 


8.6 

5.19 

Error 


5 



1 

Total 

0.377 

10 

HHI 




As mentioned in the last paragraph, the treatment mean square computed 
in the usual way from the augmented data is biased upwards. In order to 
correct for this bias, we compute a new treatment sum of squares SSC by 
the formula 

SSC = SSC — correction for bias 

where 

correction for bias = fes + ^33 — 2yy 

3-2 


In our example the correction for bias is 


[13.2 - 2(6.7)!^ 
6 


0.007 


SSC 
3 - 1 
SSE 

(i - 1)(4 - 1) - 1 


The ratio 
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IS dislnbuted as /‘wuh two and five dcgrcS of freedom The appropriate test 
of the null hypothesis oTi = ai = oti — 0 given mTabIel2 19 shows that 
the treatment effects are differcm from zero at the five per cent level 


Tatile 13 19 

Test for /fo «i = *9 0 when One Observation is Missing 


Source of 
Variolion 


tMM 

Afean 
Square | 

Fr 

/•« 

Blocks (cores) 

' otto 

3 




Columns (depths) 

0200 

2 

OlOO 

83 

5 79 

Error 

0060 

5 

0012 



Total 1 

0177 1 

10 





Suppose in the general case with e columns and r rows, that a single 
observation is missing from the /th column and mth row Denote the dummy 
value by yi , Then it can be shown that the proper missing value x,^ can be 
estimated by 


cT. +rT^-T 
'■■■ ic- l)(r- 1) 

where 


(12 46) 


T, denotes the sum of the recorded observations in column I 

T'* denotes the sum of the recorded observations m row m (12 47) 

T denotes the sum of all recorded observations 

Now, if y „ IS entered m the missing cell the augmented data can be analyzed 
in the usual manner except for the two differences discussed above That is 
find the new error mean square s' using the divisor (c — l)(r — I) — 1, 
and subtract from the treatment sum of squares SSC^hc following correction 
for bias 


correction for bias = B = 

c(c - J) 

Then the F ratio 

SSC - B 
c - 1 
SS£ 

(r - l)(r - 1) - 1 


(12 48) 


(12 49) 


with c ~ 1 and (<• - l)(r — 1) - 1 degrees of freedom is used to lest the 
null hypothesis Oi = *= a, = 0 at the desired level of significance The 

standard error of the difference in the mean of the treatment with a missing 
value AT and the mean of any other treatment X (i ^* /) is 
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c 

r(c- l)(r-.l) 


(12.50) 


We now consider the cases where more than one observation is missing. 
If one or more complete rows (columns) are missing, we carry out the usual 
analysis on the reduced design, provided that at least two whole rows 
(columns) still remain. If the missing observations are in different rows or 
different columns or both, we follow the same general procedure given above. 
Appropriate formulas for filling in the missing cells are given by Yates 
[40, 42] and Glenn [23]. But rather than give these formulas we describe an 
iterative method which gives the same values. 

To illustrate, suppose three values are missing. First, place “guessed” 
values in two vacant cells — the mean of all recorded observations may be 
used. Then use Formula (12.46) with the resulting table to find an estimated 
value for the third missing cell. Now place this value in the third cell, -remove 
one of the “guessed” values from a cell designated as the “first missing cell,” 
and use Formula (12.46) to find an estimated value for the first missing cell. 
Now place this value in the first cell, remove the “guessed” value from the 
“second missing cell,” and use Formula (12.46) to find an estimated value 
for the second missing cell. Next, place this value in the second cell, remove 
the estimated value from the third missing cell, and use Formula (12.46) 
to find a second estimated value for the third missing cell. Repeat this cycle 
until there is very little or no change in successive estimates in each cell. 
The resulting values are the ones that would have been obtained from 
cumbersorne formulas. 

In order to test the null hypothesis a, = • • • = = 0 with the usual 

F test, we compute the component sum of squares, using the table of obser- 
vations augmented with k missing values, say. Next, find the error mean 
square, using a divisor of (c — l)(r — 1) — k, and reduce the treatment 
sum of squares SSC by a factor B to correct for bias. Then the F ratio 


SSC - B 


(c - l)(r - 1) - A- 

with c — 1 and (c - l)(r - 1) — A degrees of freedom, should be used to 
avoid a biased test procedure. 

In most cases where the number of degrees of freedom of the error term 
is larp. the approximate test, in which the correction for bias is ignored, 
is satisfactory. However, if a correction seems desirable and more than one 
observation is missing, the reader is referred to Yates [40. 42] for the appro- 
priate correction formula. Some special formulas occur in the exercises of 
Sect. 12.8. 
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12 7 RANDOM EFFEaS AND AUXED EFFECTS MODEiS /N TWO WAY 
CLASSIFICAT/Of^ DSSIGNS WITH A $JNGIE OESERV/AT/ON P£R CEU 

In Sect 12 2 we gave an example of a mixed model m which rows (blocks) 
were randomly selected and columns (treatments) were fixed In Sect 123 
we developed the fixed model (both rows and columns fixed) m some detail 
The theory and methods presented in Chaps 10, 11, and 12 (Sects 121 
through 12 6 ) are easily extended to the random effects model and two types 
of mixed effects models discussed in this section Assuming that the reader 
will have little difficulty in making the necessary extensions, we do little 
more than present the analysis of variance models and analysis of variance 
tables including the expected mean squares 

model equation, in each case, is the same as that defined inEq (12 14), 

that IS 


X.j = fi + «, + + «„ (lal, ,c. ;«1, , t ) 

and the component parts are given the same designations (names) In each 
analysis of variance model 4 1 $ fixed and e,„ ,<«, are normally and 

independently distributed with zero means and common variances The 
effects Uf and i3, may be either fixed or random, depending on the analysis 
of variance model If the are fixed, then they are measured as deviates 
from n such that 


2 «< = 0 (2 = 0^ • 

If the oil Wi) are random, they are assumed to be from a normal population 
with zero mean and variance oi (<r«) Further, it is assumed that all random 
effects in any model are independently distributed Both the a, and 0, effects 
are fixed in the fixed eff«ts model which is denoted by fixed model In the 
mixed effects model eilher or< is fixed and ff, is random, or 0, is fixed and 
a, IS random They are denoted by mixed model (a) and mixed model 09), 
respectively The random effects model in which both the a, and 0, are ran- 
dom IS denoted by random model 

As with the fixed effects model in the two-way classification design, the 
model equation along with the assumptions which follow, may be used to 
determine the expectations of the three component sum of squares on the 
right-hand side of Eq (12 24) These expectations, together with the degrees 
of freedom identity and theorems in Chqos 7 and 9^ -ma^v he i^nnlied to 
determine the nature of the distnbutions of the component sum of squires 
as well as the distributions of their ratios The expected mean squares for 
three of the four exocrimental models in a two-wav classification riesisn 
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model and mixed model (a)]. Since the form for mixed model (/9) is like 
that for mixed model (a), we give only the former in Table 12.20. 


Table 12.20 

Analysis of Variance and Expected Mean Squares for the Two-way Classification with 

One Observation per Cell 


Source of 

Sum of 

Degrees of 

Mean 

Expected Mean Squares for 

Variat’on 

Squares 

Freedom 

Square 

Mixed Model (a) 

Random Model 

Columns 

(treatments) 

ssc 

c — \ 

A 


a- + rtrl 

Rows 

(blocks) 

SSR 

r~ I 

4 

<r- + Ctrl 

<r^ + cal 

Error 

(residual) 

SSE 

(c - I)(r - 1) 

4 

a- 


Total 

SST 

cr — 1 





No matter which experimental model is used, the fixed effects are esti- 
mated as in the fixed model, and the components of variance are estimated 
as in Example 12.1, Tables 12.7 and 12.20 being used as guides. The procedure 
for testing the hypothesis 

: oCi = ‘ = Uc = 0 

when the oCi are fixed, the hypothesis 

ffo • — • • • = — 0 

when the /Sj are fixed, the hypothesis 

//o’. OQ = 0 

when the a, are random, or the hypothesis 

Ho'. Ofl = 0 

when the Hi are random, may be explained in terms of the expected mean 
squares shown in Tables 12.7 and 12.20 The power of any of these tests is 
obtained in the usual manner. The methods for testing or establishing con- 
fidence intervals for a linear comparison of cr,, • • . , {Hu ■■ .,Hr) when 
the effects are fixed are like those already described. When the effects 
at (Hi) are random, approximate confidence intervals for o-= (o-l) may be 
obtained in the usual fashion. 

12.8. EXERCISES 

12.1. (a) Complete Table 12.21 for the analysis of variance and expected 
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T*Me 1221 


So iree of 
i or lot on 

Squares 

Dtgrtei of 
TrteBom 

^uare 

Expected 

Mean Square 

Blocks 

26 8 

4 



Treatments 


3 



Error 



2S 


Total 

85 J 





mean squares of a fixed effcaa randomized block design (b) Test the 
hypothesis that the treatment effects arc equal to zero showing all steps 
in the general test procedure (c) Identify the treatments and blocks with 
variables in your own field of study write a short statement of an 
experiment which could lead to the above analysis of variance and 
write your conclusion in terms of the variables introduced 

12 2 For in experiment with the randomized block design given in Fig 121 
suppose that the temperature mean square is 0 1076 the mesh mean 
square is 00793 and the total mean square is 00490 (a) Test the hy 
polhesis that the temperature effects are ail zero showing all steps in 
the general test procedure (b) Find a $5 per cent confidence interval 
for the common standard deviation <t What assumptions are required 
for do ng so’ 

12 3 For a fixed effects experiment in a randomized block design the obser 
vations m Table 12 ’2 were nude (at Find unbiased estimates of (he 


Table 12J2 




27 

19 

20 


Trtaimtnn 
2 3 


24 18 

20 17 

22 tb 


23 

18 


block and treatment effects What assumptions are required ’ (6) Prepare 
an anaivs \ of variance table for th s exper ment showing the expected 
mean squares fc) Test the hypothesis that the treatment effects are 
equal showing all steps in the general test procedure fd) Use Duncans 
multiple range test to make a pairwise ranking of the four treatment 
means (e) Find a 95 per cent confidence interval for the difference m 
the means of blocks I and 2 (f) Use five per cent level tests on each 
of the null hypotheses 

//u II -I /jj - it, - ~ 0 and Hat fu - ft. - 0 

(g) Assuming the analysis of vananw table found in (b) to be for 
a random effects experiment find point and 95 per cent confidence 
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interval estimates of the variance components for block means and 
for treatment means, (h) Use the method of Sect. 12.5 to compute 
the error variance for each of the comparisons in (f). Test each of the 
hypotheses in (f), using the corresponding error variance with two degrees 
of freedom. Comment on these tests. 

12.4. Table 12.23 for a randomized block design shows the length of life 
in years (measurements are hypothetical) of an outside paint applied 
under the conditions indicated (treatments fixed, mixes random), (a) 


Table 12.23 


Mixes 

] 

Hardwood, 

[ dry climate 

Treatments 

2 3 

Hardwood, Softwood, 

damp climate dry climate 

4 

Softwood, 
wet climate 

1 

4.2 

3.6 

4.9 

3.7 

2 

4.6 

3.6 

4.9 

3.7 

3 1 

4.5 

3.9 

4.5 

3.5 

4 

3.7 

2.7 

4.1 

3.5 

5 

1 

3.5 

3.2 

4.1 

3.6 


Find unbiased estimates of the treatment effects, (b) Prepare the usual 
analysis of variance table, showing the expected mean squares, (c) 
Find a 99 per cent confidence interval for the mix-to-mix standard 
deviation. Use this to make a statement about variation in mixes, 
(d) Use 5 per cent level tests on each of the null hypotheses 
Ff„i; A^i. + /i. - /ij - = 0, H„i: - fH. + - jtt, = 0, and 

Jfni- )I|. — IM. ~ fiz + lit = 0. In each case, state your conclusion 
in terms of type of wood or type of climate or both. Are the three tests 
independent? Why? (e) Find a 99 per cent confidence interval for 
Ml ~ Ms. + Ml ~ Ml. 1 for Ml. — Ms ; for Ms • Indicate how each of 
these intervals might be used, (f) Use the method of Sect. 12.5 to compute 
the error variance for each of the comparisons in (d). Test each of 
the hypotheses in (d), using the corresponding error variance with 
four degrees of freedom. Comment on these tests, (g) Make a table of 
unbiased cell means. Find a 90 per cent confidence interval for the mean 
for mix 2 and treatment 3; for mix 4 and treatment 2; for the difference 
in the means in these two cells. Comment on the use of each of these 
intervals. 

12.5. Derive the expected mean squares in Table 12.3. 

12.6. For the two-way classification, state and prove a sum of squares identity 
similar to (10.32). That is, find a sum of squares identity involving 
M. «i, and and prove that it hold;;. 

12.7. In Eq. (12.23) prove that (a) 

2 2 ~ j -x)~0 

i 1 
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and (b> 

118 Prove Eq (12 27) 

12 9 Prove Eq (12 28) 

12 10 (a) Table 12 24 shows the true means m the cells (populations) of a 
twO'way classification Find the true over all mean treatment effects 


Bheks 


42 

41 

44 

47 


Table 12^ 


TrtatmeHl 

2 3 4 


44 44 4S 

45 4S 46 

46 46 47 

49 49 50 


47 

4S 

49 

52 


and block effects What would the treatment and block effects be if the 
true overall mean were 407 (b) Table 12 25 gives a single random 
observation made in each of twenty normal populations with means 
shown in (a) and common standard deviation v » I Find unbiased 


TaMe lt25 


SlMki 


1 407 

2 416 

3 448 

4 466 


Tnaimtnii 

2 3 4 


45 3 44 6 45 7 

444 45 1 474 

45 I 45 2 47 0 

50 0 50 0 502 


471 
49 2 
516 
52 4 


estimates of all treatment means and effects Compare the estimates 
with the true values (c) Use (b) to find an unbiased estimate of tr' 
with 1 2 degrees of freedom Use (a) and (b) to find an unbiased estimate 
of cr* with 20 degrees of freedom Compare these two estimates (d)Use 
(b) to prepare an analysis of variance table (e) Test at the five per cent 
level the hypothesis that the treatment means are all equal Is either 
a type 1 or a type 2 error inade7 (f) Use a five per cent level test to 
determine whether the values in Table 12 25 could have been drawn 
from the five treatment populations in (a) (g) Use (b) to find 90 per cent 
confidence intervals for each of the treatment means and each of the 
block means Check to see how many of these nine intervals contain 
the true means (h) Find simultaneous 90 per cent confidence intervals 
for linear combinations of the treatment means Find the limits of 
20 specific linear combinatons and use Table 12 24 to determine 
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how many of the corresponding true linear combinations fall between 
these limits. 

12.11. In Exercise 12.3(h), find the expected mean squares for each of the 
comparisons when (a) the 12 population variances are unequal, (b) four 
variances within a block are equal, but the three block variances are 
unequal, and (c) all variances are equal, (d) Find the expected mean 
squares for the error variances in (a). 

12.12. In Exercise 12.4(d), find the expected mean squares for each comparison 
when (a) the 20 population variances are unequal, (b) the five variances 
for a treatment are equal, but the four treatment variances are unequal, 
and (c) all variances are equal 

12.13. Prove Eqs. (12.41), (12.42), (12.43), and (12.44). 

12.14. Suppose the observation in the first block and for the 'first treatment 

is missing from the table in Exercise 12.3. Test the hypothesis 
Ha ‘. «) = tta = «3 = (Yi = 0 by (a) the approximate test procedure 
illustrated in Table 12.18, (b) the test procedure illustrated in Table 
12.19. (c) Find the 95 per cent confidence interval for /i|, — jlz.. (d) 
Test the hypothesis H„: oc, = oc^ = cc^ = a, = 0 in the case where 
the observations are missing in row 1 and column 3 and in row 2 and 
column 1 . (e) Test the hypothesis of (d) if all observations are missing 
in block 3. (f) Test the hypothesis Ho', a, = = 0:4 = 0 if all obser- 

vations are missing from column 3 along with the observation in row 
1 and column 1 . 

12.15. Prove Eq. (12.46). 

12.16. Derive a formula similar to Eq. (12.46) for the case of two missing 
observations. 

Hint. Consider two parts: one, when both missing observations 
are in the same row (or column); the other, when the observations are 
from different rows and different columns. 


72.9. RELATIVE EFFICIENCY OF DESIGNS 

On at least three occasions the reader has had an opportunity to ask 
which of the two designs should be selected. In Chap. 8 the problem arises 
when a decision must be made on whether to compare two treatment means 
by pairing observations or by obtaining two independent sets of observations 
In Sect. 11.4 we again found it necessary to decide on which of two neste 
designs to choose. Now, in this chapter we might well ask for some rule,n 
deciding whether to use the randomized block design or the comp’ 
randomized design. To introduce a general method of comparing d 
let us first consider an example. ^ 

Example 12.4. (a) Assuming no appreciable variation from cor-, 

jst 
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in Example 1 2 1 lest to determine if the acidity level is the same at different 
depths of soil (b)Compare the txpenments in Examples 12 1(a) and 124(a) 
first, note that when there is no core lo-core variation we analyze the 
data in Table 12 1 as a one-way classification This means that the sum of 
squares for cores (blocks) is not partitioned out of the total sum of squares 
Now, since the treatment totals T, and grand total T for the two designs are 
the same, it follows that the within sum of squares for the completely ran- 
domized design IS equal to the sum of the residual and block sums of squares 
for the randomized block design That is 

Within SS = Residual 55 + Block 55 (12 52) 

Substituting the sum of squares from Table 12 3 in Eq (12 52) leads to the 
analysis of variance shown in Table 12 26 


Table 12 26 

Analysis of Variance for Soil Acidicy m a One-'W'ay Cfauification 


Sourte of I 
I'ariaiion \ 

Sum of j 

Dtgrttt ef\ 
Freedom 


F 1 

ftf 1 

1 Expeeted 

Mean Squares 


mtm 

s 

m 

B 

B 

ggH 


IBB 

n 

IlH 


1 1 

Bg 


Since the computed F is less than the tabled F, we fail to reject the null hy- 
pothesis and conclude that we do not have enough evidence to say that 
the acidity level differs with soil depths 

On comparing the bonclusions resulting from the two designs, observe 
that the equality of acidity levels is rejected for the randomized block de- 
sign and not rejected for the completely randomized design Since the block 
mean square is 4 5 limes as large as the error mean square, we expect quite 
different results when the two are pooled to obtain a new error (within) 
mean square In this particular experiment we do not need a rule to know 
that the randomized block design is belter But if the two experimental 
error variances had nearly the same values, a decision would not be so 
obvious Thus, we now give a general procedure for selecting the better 
of two designs 

Iti comparing two treatment means we need to know the error variances 
<sociated with them Thus, ifjCi and x. are means of samples of sizes n, 
' respectively^ with corresponding error variances 
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has better precision. This can be expressed in terms of the ratio (r\Ja^, which 
is called the relative efficiency of mean x, with respect to mean x„. Denoting 
the relative efficiency by R. E. (x, relative to x,), we can write 

R. E. (x, relative to x*) = — ^ (12.53) 

7^2 

or, if the means are based on equal sample sizes 

R.E.(x, relative to Xj) = ^ (12.54) 

If the ratio is larger than one, we say that Xj is more efficient (or precise) 
than Xs; if the ratio is smaller than one, we say that x, is less efficient than 

^ 2 - 

We wish to define the relative efficiency of two designs which contain 
the same treatments. Even though this can be done in many ways, it is 
customary to use the approach indicated above. But, since the true error 
variances of designs are seldom known, it is desirable to define an estimate 
of relative efficiency in terms of the experimental error mean squares. If 
s} and are unbiased estimators of irf and o-j, we could define the estimate 
of efficiency of design 1 relative to design 2 by 

est. R.E. (design 1 relative to design 2) = (12.55) 

Generally, it is understood that 5; is the expected value of the error variance 
of the second design as computed from the sums of squares and degrees of 
freedom in the first design. 

The estimate given by Eq. (12.55) is useful unless the number of degrees 
of freedom associated with jJ or si is small. For this case we use the definition 
(12.56) given by Fisher [20] and described by Cochran and Cox [10]. Fisher 
calculated the “amount of information” which the difference between two 
estimated treatment means gives about the difference between the true 
treatment means. The amount of information is (v + I)/(i; + 3)s\ where 
s- is the experimental error variance with v degrees of freedom. 

Note. If the variance a-^ were known, the amount of information would 
be l/o". 

Thus, when the number of observations in the treatment of two designs, 
are the same, the estimate of the efficiency of design 1 relative to design 
2 is given by 

2*1 4~ 1 

est. R. E. = est. R. E. (design 1 relative to design 2) = (^i + 

Vz li 
(V2 + 3)j| 



430 


ANALYSIS OF VARIANCE MULTIWAY CLASSinCATIONS 


CHAP 


or 


est R E 


- (»i + OC*'« 3)J? 

(*-1 + 0(vi + 3)J? 


0256) 


where and s' denote the experimental error mean squares of designs 1 
and 2, respectively, and »», and v, denote their corresponding degrees of 
freedom So when treatment means are based on samples of different sizes, 
the estimate of the efficiency of design 1 relative to design 2 is defined by 


Note We might also refer to Eqs (12 56) and (12 57) as giving relofive 
in/ormafion 

Now we find the estimate of the eflkiency of the randomized block design, 
RBD, relative to the completely randomized design, CRD This requires 
that we obtain an estimator of the error variance. s\, for the CRD, using 
the mean squares and degrees of freedom in the RBD If 5? is the error 
variance of the RBD, then it can be shown (see Ref {101 for a prooO that 

n = ~ ~ Cl 


where s\ js the block mean square and e and r denote the number of columns 
and rows in the design 

For Example 12 4 we find that 


si* 


3(006)+ 

n 


0 86 
33 


Since the sample sizes for treatments are equal (n, = n, » 4) and the degrees 
of freedom for error variance small (w, = 6 and vt = 9), we use Eq (12 56) 
to find 

(7)(J2)(2J§\ 

est R E {RBD relative to CRD) = = ] 82 

(10)(9)(^) 

Thus, the efficiency of the RBD relative to the CRD is I 82 or 182 per cent 
We may also say that the relative gam in efficiency is 82 per cent, or 0 82 
Note If the error variance from Table 12 21 had been used, then the 
est R E would be 

(io)(9)(5|1) 
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and this overestimates the efficiency of the randomized block design in this 
example. 

As indicated in Sect. 11.4, the concept of relative efficiency may also be 
used to compare a given design to some other design of the same type having 
different sample and subsample sizes. We use Eq. (12.55) and Example 
11.2 to show how this may be done. We would like to know if the nested 
design in Example 11.2 having ten treatments with samples of size « = 3 
and subsamples of size r = 2 is more or less efficient than the nested design 
having ten treatments with n = 2 and r — 3. The experimental error variance 
si = 17.55 of the design in Example 1 1.2 is equal to si + rsl = 0.68 + 2(8.44). 
Assuming that the estimates of c* and <t| would remain unchanged even 
though r and n change, we find the estimate of the experimental error 
variance si of the new design to be 

si = 0.68 + 3(8.44) = 26.00 

The number of replications for each treatment in the old design is 
rtj t= /■« = 2 • 3 = 6 and in the new design is «2 = 3 • 2 = 6. Thus, the estimated 
efficiency of the old design relative to the new design is 

|^j= 1,48, or 148 per cent 

The reader should verify (Exercise 12.21) that the paired-observation 
experiment in Example 8.6 can be analyzed as a randomized block design 
with 16 blocks and two treatments. Then a special formula should be found 
which gives an estimate of the efficiency of a paired-observation experiment 
relative to an experiment with two independent sets of observations. 


12.10. S.UBSAMPUNG IN A RANDOMIZED BLOCK DESIGN 

Sometimes it is desirable to subsample in a randomized block design 
just as in a completely randomized design. For example, in the acidity 
experiment (Example 12.1) let the depths of the soil be top (depth of zero 
to two in.), middle (depth of six to eight in.), and bottom (depth of 11 to 
13 in.), instead of one, seven, and 13 inches. Remove the top two inches of 
soil in a core sample, mix the soil thoroughly, and select two samples at 
random to analyze for pH level. In the same way select two samples at 
random from each of the other 1 1 treatment and block combinations. 

Since the amount of information per observation is greatest, and since 
the calculations are less involved for an equal number of observations per 
cell, we give the analysis for this case only. The general case may be found 
in other references [2, 26, 36]. We write the model equation for subsampling 
in a randomized block design as 
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I =: 1, ,e j s= J.. . , r u =5 1, , . 


( 1159 ) 

n 


where Xo, denotes the uth random observation in the^th block for the ith 
treatment, fi denotes a constant over-aK mean, or, denotes the effect (constant 
or random) of the ith treatment, ff, denotes the effect (constant or random) 
of the ;th block, denotes the effect (constant or random) of the jth block 
on the ilh treatment, and denotes the wth random effect m the jth block 
for the Ilh treatment We cafl £„ an experimental effect and tij, a sampling 
effect 

Clearly, there are eight experimental models which have the mode! 
equation in (12 59) At this time, we discuss only the four with random 
experimental effects (Actually, we list only three, since the two mixed models 
are “symmetric*' in or and 0) The case where Su is fixed is discussed m 
Chapter 13 on factorial experunertis 

Since the assumptions and definitions associated with the effects it, ai, 0j 
are the same as those in Sect 12 7. we do not restate them here The 
takes the place of ru in the earlier sections of this chapter, and is a new 
random component v^hich is assumed to be normally distributed with mean 
zero and mean 9^ Any effects which are random are assumed to be inde* 
pendently distributed 

The cell, column, row, and grand totals are given by 

r, 

Ir, - ‘"0 2 

respectively The cell, column, row, and grand means are given by 



respectively If the ai are fixed, they are estimated by 
0^ = a, — X (i = 1, , c) 

If the 0) are fixed, they are estimated by 

(/=! i, ..,rf 

For any of the four experimental models under consideration, the sum of 
squares identity is given by 


SST = SSC + SSR + SSE + SSS 


(12 62) 
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where 


ssr - 2 2 2 = 2 2 2 - — 

(12.63) 

2 2 n. 

5S5 = 2 2 2 )— 2 2 2 ‘ n 

i J u 1 J ^ 

(12.64) 

2 r 

SSG = rn 2 (■^‘ ■ ~ fu crn 

1 

(12.65) 

2 7’:.. r- 

SSR — cn 2 cn cm 

j 

(^12.66) 

SSE = « 2 2 ^ 

1 J 

(12.67) 

= SS{ST) - SSC - SSR 


2 2 n r- 

(12.68) 

Note that Eq. (12.64) may also be written as 


SSS = SST - 55(57') 

(12.64a) 


We use SST, SSC SSR, SSE. SSS. and SSiST) to denote total sum of 
squares, column sum of squares, row sum of squares, experimental error 
sum of squares, sample error sum of squares, and sum of squares for sub- 
totals (cell totals), respectively. 

The analysis of variance is given in Table 12.27, and expected mean 
squares for three experimental models are shown in Table 12.28. The justifi- 
cations for the entries in these tables are similar to those already given for 
other designs. As in other cases already discussed, the expected mean squares 


Table 12.27 

Analysis of Variance for Sampling in a Randomized Block Design 


Source of Variation 

Sum of Squares 

Degrees oj 
Freedom 

Mean Square 

Subtotals 

SS(ST) 



Blocks 

SSR 

r- 1 

si = SSR/(r - 1) 

Treatments 

SSC 

c — 1 

si = SSC/(c - 1) 

Experimental error j 

SS(ST)~SSR-SSC' 

(c - l)(r - n 

sit= SSF/Xe-lKr-l) 

Sampling error 

SST-SS(ST) 

cr(n — n 

si — SSSjciyt — 1) 

Total 

SST 

cm — 1 












434 


ANALYSIS OF VARIANCE MULTIWAY CLASSIFICATIONS 


CHAP 


can be used as guides in estimating vanance components, testing hypotheses, 
in establishing confidence limits, and in finding power for a test 


Table 12^ 

Expected Mean Square for Sampling in a Randomiaed Block Design 


Source of 
i anaiion 

1 

Degrees of \ 
Freedom 


1 Expected Mean Squares for 

Random 

Model 

Fixed Model i 
(».«) 

Mixed Model 
(«) 

Blocks 

(rows) 

> - J 


+ ena\ 

«■* + nrl 

<r^ + Cim\ 

Treatments 

(columns) 

e- 1 

' A 

\ 

•• + n«i 
+ rn«i 

rn Vb« 

+ nol' 

, 

Experimenlal 

error 

(e-l){r-J> 

A 


»’ + 

e* + nr] 

Sampling 

error 

er(»i - 1) 

s* 




Total 

ew - 1 




1 


1211 THE uriN SQUARE DESIGN 

The randomized block design was introduced to eliminate from the analy* 
SIS of trealmem means the larger part of the variation due to heterogeneity 
of the experimental material That is. m using the randomized block design 
instead of the completely randomized design, we impose one restriction 
(the blocks) on the experimental units so as to decrease the error mean 
square Actually, it may be desirable in some experiments to impose or 
more restrictions on the experimental units The Latin square design is 
introduced for the case where a two-way classification (t^o restrictions) is 
made on the experimental units 

The plan for the Latin square design with O experimental units i$ to form 
a square with frows and / columns and then to apply r treatments in the cells 

Table 12 29 

A Plan for a Latin Square Design 

f Columns 

12 3 4 


A B C D 
B A D C 
CODA 
J> C A B 
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in such a way that each treatment occurs only once in each column and once 
in each row. In this way each row and each column contains a complete 
replication of the treatments. Table 12.29 illustrates a Latin square design 
with four treatments A, B, C, and D. Such a design eliminates from the 
errors differences among columns as well as differences among rows and 
allows a better opportunity to reduce the errors than a randomized block 
design does. As an illustration consider the following example. 

Example 12.5. A continuous sheet of paper eight feet wide is manufac- 
tured and placed on rolls for distribution. Later these rolls of paper are 
to be made into bags for grocery stores. It is claimed that the application 
of a certain chemical solution will increase the tensile strength of the paper. 
Four different solutions are available. The problem is to (a) estimate the 
effect of each solution on tensile strength, (b) test the hypothesis that they 
are all equal in their ability to increase tensile strength, and (c) test the 
hypothesis that the special chemical solution is equally as effective as the 
average of the other three. 

It is known that tensile strength varies with batches of raw material 
(pulpwood, etc.) and with distance from the edge of the sheet. (Tensile 
strength is least near the edges and greatest in the middle of the sheet.) 
Thus, a Latin square design is considered appropriate for the analysis. 

We obtain a particular experimental plan by the method explained 
below. For each of four batches, cut across the sheet from edge to edge to 
obtain rectangular sections eight feet long and one foot wide. Then cut and 
label four pieces from each section, as illustrated in Fig. 12.2. Denoting 
the solutions hy A, B, C, D, the sections by row 1, row 2, row 3, row 4, and 
the four pieces numbered 1 by column 1, the four pieces numbered 2 by 
column 2, etc., we can use the plan in Table 12.29 for applying solutions 
to pieces. For an experiment conducted according to this plan, the tensile 
strength (in pounds) of pieces of paper and the appropriate totals are given 
in Table 12.30. 


Pieces 


1 


2 


3 


4 







'/f/ 


///• 

/// 


'.'/A 

'/r'/ 




* 


i 


2 


5 

4 

5 

6 

7 

8 


Feet 


Fig. 12.2 Plan for Cutting Pieces from a Typical Section 
of Paper in the Tensile Strength Experiment 

Before giving the solutions to the three problems of Example 12.5, we 
present the general model and analysis for the Latin square design. 
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Table 1230 

Tensile Strength of Paper in a Latin Square Design with Random Assignment 
of Solutions Shown bv the Letters in Parentheses 


1 

1 Pifce Viffii6rrtd>sianLc from edge 11 

fi inches) 

Row Totals 


[ 1 13') 

2IM’) 

3(33') 

4(48') 

1 

4) 3M> 

38 8(fi) 

39 MC) 

42 2(D) 

161 6 

2 

1 37 7(fl) 

41 

412(D) 

39 3(C) 

159 2 

3 

1 38 2(C) 

404{D) 

39 3(fi) 

43 3(4) 

161 2 

•» 

38 8(D) 

38 2(C) 

416(A) 

38 4(D) 

1580 

Column 

totals 

1560 

1S84 

162 4 

163 2 

640 0 


Total for treatment <1 s= 41 3 + 4| 0 + 42 6 + 43 3 = 168 2 

Total for treatment 6 = 32 7 + 38 8 + 39 3 + 38 4 = 154 2 

Total for treatment C « 38 2 + 38 2 + 39 3 + 39 3 *= 155 0 

Total for treatment 0=* 38 8 + 404 + 41 2 + 42 2*1 62 6 

Grand total = 6400 

The modtl equation is 


T(ji» = n 


i8; + T» + 

.f /- f. 


/c = i, ,t 


(12 69) 


where x„(„ denotes the observation for the Arth treatment falling m the ith 
column dnd;th row fx a, and>9,aredefinedasirnhetwo-wayclassification, 
is the effect of the kxh treatment the «.i „ are independently and normally 
distributed with means zero and common variance a', and 


= 0 (1270) 

Note that this is a fixed model design The A is placed jn parentheses to 
indicate that it is not independent of i and j and to emphasize the fact that 
there are not r-f-f observations in the cupenment This model is the correct 
one to use in an experiment where column and row clTecU are additive, and 
where the errors can be made more homogeneous by imposing two restric- 
tion« on the experimental units 

The column, row, grand, and treatment totals are given by 

— 2 7 " ^ “ 2 

7" = 2 2 ^‘Jrti = 22 = 22 2nd 

7* (,, ^ total of all observations associated with the 
A;th treatment 
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respectively. The column, row, grand, and treatment means are given by 


< 



t 


T, 

t ' 



(12.72) 


respectively. Unbiased estimates g,, bj, and of the effects dt, and 7^, 
respectively, are given by 

a, = Xi_ — X 1 = 1 ,...,/ 

■ bj = x_, — X j = 1, . . . , / (12.73) 

. Cjt = X k 1....,/ 

The treatment effects are the only ones which are usually estimated. The 
sum of squares identity is 


where 


SST - SSC + SSR + SSTr + SSE 


(12.74) 


S5r = 2 2 4(-^> - 4 

> j f 

J.Ti. j 


r- 

T' 


_ i 


SSC = 


SSR = 


SSTr = 


/ 

/-’ 

2 7-0 

J 


t 

1- 

'V J! 


A 

_ T^ 


1 I- 

SSE = SST - SSC - SSR - SSTr 


(12.75) 


The analysis of variance and expected mean squares are derived in the usual 
way and are shown in Table 12.31. 

Now, returning to Example 12.5. we find the unbiased estimates of tensile 
strength to be 


T, = c, 


168.2 

4 


640.0 

16 


42.05 


40.00 = 2.05 


f, = c, = 38.55 - 40.00 = -1.45 
C 3 = c,. = 38.75 - 40.00 = - 1.25 
f, = 40.65 - 40.00 = 0.65 


The special solution is treatment A, and the point estimate indicates that 
this solution gives largest tensile strength of any of the solutions. In order to 
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Tsble 12JI 

Analysis of Variance and Expected Mean Squares for a Latin Square Design 


Source of 
yarialion j 

Sum of Sguarts 

Degrees of 
Freeebun 

Meat Square 

Mean Square for 
hxed Model 

Columns 

ssc 

t~ 1 

A _ SSC 

2 “’ 

Rows 1 

SSft j 

t~ 1 


2^3 

Treaimenis 

SSTr 

» - 1 

J c: SSTr 


Error 

SSF 

{by subtraclion) 

fr - l)tr - 21 

^ u-iit-i) 

■' 

Total 

ssr 

r»- 1 




determine if the solutions are significantly different from each other, make 
the analysis of variance shown in Table 1232 Ji is clear, on looking at the 
table that the solutions have significantly different effects (Note that the 


Tabk 12 )2 

Analysis of Variance for the Daia in Table 12 30 


Source of 1 




F, ' 



ionatlon 

Squares \ 

Freedorn 

Square 

Fti 


Oisunces 
from edge 

864 ' 

3 

2 Si 

n 3 


e +42»’i/3 

Batches 

2 16 

3 

0 72 

29 


e» + 4 V e»/3 

Solutions 1 

33 16 

3 

not 

44 2 

4 76 

eW 4 275/3 

Error 

1 SO 

6 

025 




Total 

43 46 

>3 






number of degrees of freedom for the error term is very small Generally, 
m practice, squares of size 4 x 4 or less should be repeated so as to increase 
the degrees of freedom for the error variance) In fact, the solutions are 
significantly different at the 0001 level To determine whether solution M 
IS significantly better than the average of chc other three, compute 

= (2 ^ 13(168 2)- 1(154 2)- 1(155 0)- 1(162 6))» 

^ 4^m» 4 12 

= 224 


_22 4 
Ji 025 


and 
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and compare with the upper five per cent value of the F distribution with 
one and six degrees of freedom. Since FosCl) 6) — 5.99 andFoooi(l) 6) = 82.5, 
we conclude that solution A gives paper greater tensile strength than the 
other three solutions, understanding that there is less than a 0.0001 chance 
of making a type 1 error. Further, since 

t.025(6) = 2.447, = T t.025%. ,,i = 0.61 

the 95 per cent confidence intervals for the treatment means are 

41.44 < /x^ < 42.66 
37.94 </iB< 39.16 
38.41 <iic< 39.36 
40.04 < fln< 41.26 

The number of possible arrangements of treatments in Latin squares 
increases rapidly as t increases. For a 2 x 2 square, treatments may be 
arranged in only two ways, namely 


and 


A 

B 

B 

A 


B 

A 

A 

B 


There are 1*3!2! = 12 different arrangements in a 3x3 square, 
4*4!3! = 576 ways to arrange a 4 x 4 square, 56»5!4! = 161,280 ways to 
arrange a 5 x 5 square, and 9408*6!5! — 812,851,200 ways to arrange 
a 6 X 6 square. There are four standard 4x4 squares, 56 standard 5x5 
squares, and 9408 standard 6x6 squares, and each standard square can be 
permuted in — 1)! ways. A standard square is one in which the letters 
in the first row and first column are arranged in alphabetical order. The 
four standard 4x4 Latin squares are given in Table 12.33. Fisher and Yates 


Table 12.33 

The Standard 4 X 4 Latin Squares 



[22] give all 4 X 4, 5 x 5, and 6x6 standard squares, along with 
7x7, ...,12x 12 sample squares. Cochran and Cox [10] give sample 
squares from 3x3 through 12 x 12. Norton [32] and Sade [35] give 562 
7x7 squares from which it is possible to generate the 16,942,080 standard 
squares. Not all standard squares of higher-order Latin squares have been 
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tabulated Other discussion on the formation of Latin squares is found in 
Refs [21,22.26 31, 32.35] 

A random Latin square design may be obtained by selecting a standard 
square at random arranging the columns in random order, and then arrang* 
ing the last / — I rows m random order If standard squares are not avail- 
able. It IS usually adequate to construct a Latin square then randomly to 
arrange columns and rows, and finally, to assign treatments to the letters 
at random 

The requirement that a Latin square design must have the same number 
of rows columns and treatments is its principal disadvantage If the number 
of treatments ts small, the number of degrees of freedom for the error mean 
square is small or does not exist For j 2 x 2 square there are no degrees of 
freedom and for the 3 x 3 square there are only two degrees of freedom for 
the error mean square However, the degrees of freedom may be increased 
by using more than one square in the same experiment On the other hand 
if the number of treatments is large the number of experimental units soon 
becomes too large for practical purposes Thus 5 x 5 through 8 x 8 squares 
are the most useful Numerous examples of Latin square designs occur 
in the literature These designs are particularly useful in research m social 
studies agriculture industry medicine and marketing studies 


12 12 MISSING PtOTS IN LATIN SQUAKK 


The procedures for analysis of a Latin square design with one or more 
missing values are about the same as for the randomized block design 
If a single observation is missing from the cell m the Ah column and mth 
row, denote the dummy value by , and estimate the missing value 
Y,„(p, by 


where 


HT, +T^ + T ,)-2T 
(I - l')(r - 2) 


(12 76) 


I T, denotes the sum of the observed values m column / 

T„ denotes the sum of the observed values in row m 
T „ denotes the sum of the observed values for treatment p (12 77) 
T denotes the sum of all observations 
Then substitute the dummy value > . , for the missing value and compute 
the sums of squares in the usual way The degrees of freedom for error mean 
square is now (r - l)(i - 2) - I and in testing the hypothesis 


//, r, _ -7,- 0 (12 78) 

a correction for bias in the treatment sum of squares sbould be made The 
correction is made by computing 
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(12.79) 


where 

„ rr - T[ - - (/ - i)r.,p.r 

^ [(/ - 1)(/ - 2)]-^ 

Then the F ratio 

SSC' 
t - 1 
SSE 

(/ - 1)(/ - 2) - 1 


(12.80) 


(12.81) 


with ( - 1 and (f - !)(/ - 2) - 1 degrees of freedom is used to test (12.78). 
Kramer and Glass [29] give a direct method of analysis which does not 
require a correction for bias in the treatment sum of squares. The standard 
error of the difference in the mean of the treatment with a missing value 
and the mean of any other treatment x, {k ^ p) is 



1 1 
(t - \){( - 2)_ 


(12.82) 


If more than one value is missing, the reader is referred to articles by 
Yates [43], Yates and Hale [45], DeLury [15], and Kramer and Glass [29]. 
The articles by Yates and Hale give methods for analyzing the Latin squares 
when a single observation is missing; when one or more columns, rows, or 
treatments are missing; and when one column and one or more other ob- 
servations is missing. DeLury also gives many of these methods. Kramer 
and Glass give explicit formulas for each missing value for many special 
cases, a procedure for analyzing the general case, and a direct method of 
analysis of variance which does not require correction for bias in the treat- 
ment sum of squares. 


12.13. EFFICIENCY OF THE LATIN SQUARE RELATIVE TO OTHER DESIGNS 

The definitions and methods of Sect. 12.9 may be applied to find the 
efficiency of the Latin square design relative to the randomized block and 
completely randomized designs. If we u.se the definitions for mean squares 
given in Table 12.31, the estimated error mean square, s;„, for the completely 
randomized design is 


(.2 _ + ‘^2 -h (f — l).yT 

TTl 


(12.83) 


with I/., = ((( - 1) degrees of freedom. An estimate of the efficiency of the 
Latin square design relative to the completely randomized design may then 
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be obtained by substituting (r — IK< — 2), /(/ — 1), si, and sj, respectively, 
in place of i-,. v,, s], and jJ m Eq (12 56) 

When one is comparing the Latin square design to the randomized block 
design, two estimates of relative efficiency can be made, one when columns 
in the Latin square are used as blocks in the randomized block design, and 
the other when rows are used as blocks If the columns are used as blocks, 
the estimated error mean square sf for the randomized block design is 

+ IK (12 g4) 

with (r - 1)(/ - 1) degrees of freedom If rows are used as blocks, replace 
jj in Eq (12 84) by sf from Table 12.31 to obtain the estimated error mean 
square On substituting Eq (12 64) in Eq (12 S6). we have an estimate of 
the ediciency of the Latin square design relative to the randomized block 
design 

J2 !4 OmU SXPERIMEmAL MODEIS IN AND EXTENSIONS Of THE LATIN 
SQUARE DESIGN 

The analysis for the Latin square design given in Sect 12 1 1 was for the 
hxed model That is, the column, row, and treatment 60*601$ were considered 
fixed However, in some experiments any one or all of the effects might be 
considered random For any set of effects which are fixed, make the assump- 
tions following Eq (12 69) When the column effects are random, assume 
that they are normally distributed with mean zero and variance oi, when 
the row effects are random, assume that they are normally distributed with 
mean zero and variance <r|, when the treatment effects are random, assume 
that they are normally distributed with mean zero and variance <t}. Assume 
all random effects to be independently distributed The sums of squares 


Table 12.24 

Analysis of Variance and Expected Mean Squares for a r x r Latin Square 


Source of \ 
Vanallon \ 



Expected Mean Square for 

Mean Si/uare 

Rondom 

Model 

Mixed Model 
(7) 

Mixed Model 
(«.7) 

Columns 

J ssc 

•' + re* 

** + 


Rows 

s’ - 

<f* + <*5 

e* 4- 

o’ + 

Treatmenis 

Error 

j SSTr 

1 j SSE 

j (/-l)(r- 2 t 

o’ + lol 

271 
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identity and analysis of variance are the same for each experimental model. 
However, as shown in Table 12.34, the expected mean squares depend on 
the model. The mixed models not shown are given by writing the second 
component as erf or 2 -!)(/ = or 7) according as the effect is 

random or fixed. Methods already explained may be used to give testing and 
estimating procedures, with the expected mean squares as guides. 

In some experiments with Latin square designs it is desirable that samples 
be taken in each of the experimental units. If ri samples are randomly taken 
in each of the experimental units, the analysis of variance is a straight- 
forward extension of the analysis for a single observation per cell. The sums 
of squares due to columns, rows, and treatments are computed in the usual 
way, with the understanding that there are «/ observations in each total for 
columns, rows, and treatments. The subtotal sum of squares SS(ST), showing 
the variation among cells, is computed from the totals in the cells. The 
experimental error and sampling error sum of squares are given by 

SSE = SS(ST) - SSC - SSR - SSTr (12.85) 

and 

SSS = SST - SS(ST) (12.86) 

respectively. The analysis of variance is shown in Table 12.35. The usual 


Table 12.35 

Analysis of Variance for a f x / Latin Square with n Samples per Experimental Unit 


Source of 
Variation 

Sum of 
Squares 

Degrees of 
Freedom 

Mean Squares 

Columns 

SSC 

1-1 

sf = SSCKt - I) 

Rows 

SSR 

t - 1 

= SSRKl - 1) 

Treatments 

SSTr 

t - 1 

si = SSTrlit - 1) 

Experimental error 

SSE 

(/ - 1)(/ - 2) 

sf = SSE/(l -])(/- 2) 

Sampling error 

SSS 

t ’Cn - 1) 

si = SSSIt-{n - 1) 

Total 

SST 

t-n - 1 



techniques may be applied to find the expected mean squares for the various 
experimental models. This is left as a exercise for the reader. In any case, 
if we remember that the samples are random, the null hypothesis 

I 

//(,; 7, = ••• = 7i = 0 

may be tested by comparing the ratio 

SSTr 
t ~ 1 
SSE 

(f - l)(f - 2) 


(12.87) 
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With the f distribution with t — 1 and {/ — IM/ - 2) degrees of freedom 
Designs which impose more than two types of restrictions on the expen 
mental units nre occasionally applied We illustrate designs with three restric 
tions Just as Latin letters representing treatments were used in a two-way 
classification to define a Latin square design Greek letters may be used m 
a Latin square to define a Graeco Laim square design The rule is to place 
f Greek letters in t square with t cells in such a way that each treatment 
(Greek letter) occurs only once in etch column once in each row and once 
with each Latin letter examples of 3 x 3 and 4x4 Graeco Latin squares 
are given m Table ! 2 36 Squires of all sires greater than 3 x 3 do not exist 
For example a 6 x 6 Graeco Latin square is impossible to construct 
However squires of sire r do exist when / is a positive odd integer greater 
than I or i power of i prime or i number satisfying the relation 
r ~ 4i( + 2(A 2 3 ) (Infact Pirkcr Bose and Shirkhandedisproved 

TaMc I2J6 

Sat tpk Oraeco-lat n Square Dcs gns 


AH Bt Cy 
Ay Ba 
By Ch A’ 


Oh 

D» 

Ar 

Cy 

At 

CH 

By 

D-r 

Oy 

fii 

Ct 

AB 

Cr 

Ay 

OH 

Bt 


a 177 year old eonjecfare when m t9S8 they proved that squares of size 
/ 4A + 2 (A 2 3 ) can be constructed Sec Gardner M “How 

Three Modern MathematKians Disproved a Celebrated Conjecture of 
Leonhard Euler St/c/tfs/ic Amcruan Nov 1959) These designs tend to 
be useful in areas where the Latin squire design has proved useful For 
further reading sec Refs [6 7 14 17 19 20 27 28 30) 

12 15 OrHfR TOPICS REUTfNG TO ANAlYSiS OF VARIANCES 

In general only methods of analysis of viriance and their applications 
have been presented Mmy other considerations enter in experimentation 
and there arc good books (2 6 10 19 20 26 31 34 36 37 38) on expen 
mental design which should be consulted The choice of the appropriate 
design the selection of the particular trettments the number of replications 
and the particular informilion required of the data are all very important 
topics In all the dcsiens discussed we hive assumed that the 

1 ( 3 ) Samples ire random 

(b) Effects are additiie 

(c) Populations arc normal 

(d) Error variances ire equal 


(12 S8) 
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What should an experimenter do in case some of these assumptions do not 
hold? 

The points just raised and many others need to be considered in experi- 
mentation. The student should seek for answers to his particular problems. 
Many answers are given in detail in the literature. Some questions still 
remain unanswered, but there are rules and approximations which often are 
satisfactory. Some of the above topics are now discussed very briefly. 

In Sects. 6.2 and 8.4, for example, we have already discussed for one 
and two treatments the problem of determining the appropriate sample 
size in an experiment when certain conditions are required. We have also 
pointed out that samples of equal sizes are desirable when two or more 
treatment means are compared. When the total number of observations is 
fixed, we know that equal-size samples make the computations simpler and 
the error variance for treatment means smaller. However, we have not dis- 
cussed the problem of determining the best size of an experiment or the 
size of a treatment replication. The number of treatment replications may 
be found in terms of specified values of a (size of type I error) and (size 
of type 2 error), the error variance <r’ (or its estimate s^) and the difference 
to be detected. Cochran and Cox [10], Harris, Howitz, and Mood [24], and 
others [25, 39] give detailed discussions along with tables which are appropri- 
ate for finding the sample size. 

If the four assumptions in (12.88) hold, the procedures already explained 
are valid; otherwise, the procedures may lead to approximate results or 
downright false results. It should be remembered that all inferences in 
statistics lequire the randomness assumption, but certain adjustments may 
be made so that failure of any of the other assumptions does not invalidate 
the analysis. Often a transformation can be made so that the usual analysis 
can be carried out on the transformed data. Some common transformations 
are the arc sine, square root, and logarithm transformations. The reader 
who is interested in transformations and related assumptions is referred 
to articles [3, 8, 9, 11, 12, 13, 33], to mention only a few. Applications may 
be found in these and numerous other references. 

12.16. EXERCISES 

12.17. Assuming no block-to-block variation in the randomized block design 
RBD in Exercise 12.1, we could analyze the experiment as a completely 
randomized design CRD. (a) Use Eq. (12.57) to find an estimate of the 
efficiency of the RBD relative to the CRD. (b) Pool the block and 
error sum of squares in Exercise 12.1 to find the within mean square 
of a one-way classification. Test the hypothesis that the treatment 
effects are equal to zero, (c) Discuss the desirability of separating the 
block-to-block variation from the within variation. 
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12 18 Answer (a) (b) and (c) of Exercise 12 17 replacing “Exercise 12 1" by 

Exercise 12 2" 

1219 Answer (a) (b) and (c) of Exercise 12 17 replacing Exercise 12 I" by 
"Exercise 12 3" 

12J0 Prove Eq (1158) 

12 21 (a) Analyze Example 8 6 as a randomized block design with 16 blocks 
and two experiments (b) Esitmate (be efTiciency of the pa red observa 
tion experiment relative to the experiment with two independent sets of 
observations 

12 22 In the nested des gn m Example II 2 there were ten treatments with 
samples of size three and subsamples of size two A new nested design 
IS to have the same ten treatments but with both samples and subsamples 
of size two Estimate (he efficiency of the old design relative to the new 
des gn 

12 23 Prove Eq (12 62) 

12 24 In a randomized block design with three random blocks (car loads say) 
three fixed treatments (densiiies say) and two nndom subsamples 
(determinations of similar kind) the coded determinations m Table 12 37 


Tabic 1237 


B!eekt 


2 

3 


26 

20 

19 


28 

18 

21 


Trtaifieitu 


23 26 

20 20 
23 21 


18 

16 

14 


18 

19 

18 


were obtained (a) Prepare (he usual analysis of variance table showing 
the expected mean squ.jes (b) Test (be equality of the treatment effects 
at the five per cent level (c) Find a 95 per cent confidence interval for 
the difference m the means of ireatments 1 and 2 (d) Find point and 
95 per cent confidence mterval estimates of (he variance components for 
blocks and experimental error that is for <r| and o-J (e) Assuming a 
random effects experiment find 95 per cent confidence limits of (f) 
For the purpose of comparing (reatments estimate the efficiency of the 
design in (a) and (b) relative to a sim (ar design with two blocks and 
three subsamples 

12 25 Derive the three expected mean squares for blocks in Table 12 23 
12 26 Prove that <} b, and Ct as defined in (12 73) arc unbiased estimators of 
«i and 7 , respectively 
m? Derive Eq U2 74). 

12 28 (a)^ompIete Table 12 38 for the analysis of variance and expected mean 
squids of a fixed effects Laba square design (The reader may think of 
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Table 12.38 


Source of 
Variation 

Sum of Degrees of Mean Expected 

Squares Freedom Square Mean Square 

Columns 

5 

Rows 

4.20 

Treatments 

2.43 

Error | 

0.65 

Total 

39.65 


the columns as representing schools, the rows as classes, the treatments 
as methods of teaching spelling, and the observations as grades based 
on 100 points.) (b) Test the hypothesis that the treatment effects are equal 
to zero, showing all steps in the general test procedure. 

12.29. In a marketing experiment the price of a staple item, potatoes, was 
studied. There were five cities and five types of stores in the region of study. 
Since only the five most popular kinds of potatoes were regularly sold 
in each store in each city, the Latin square design was used in the experi- 
ment. The particular random Latin square design used and the mean price 
(in cents) per ten lb (for a selected month) are shown in Table 12.39. 
(Unless otherwise indicated, the reader should consider this a fixed 
effects experiment. The size of city, type of store, and kind of potato are 


Table 12.39 


City 

1 

2 

Type of Store 

3 

4 

5 

1 

69.2(_C) 

69.0(^) 

63.2(B) 

6).6(D) 

64.5(E) 

2 

65.1(E) 

64.4(0) 

68.4(C) 

67.5(A) 

62.6(B) 

3 

63.9(E) 

63.9(E) 

66.7(A) 

65.7(C) 

62.8(0) 

4 

62.7(0) 

68.2(C) 

62.4(E) 

62.4(B) 

67.3(A) 

5 

68.1(/1) 

64.5(B) 

61.8(0) 

63.3(E) 

65.8(C) 


indicated when appropriate.) (a) Prepare the usual analysis of variance 
table, showing the expected mean squares. (The reader should check to 
see if the sum of squares for columns, rows, treatments, and totals are 
14.30, 5.10, 1 1S.47, and 141.24, respectively.) (b) Use a five per cent level 
test to determine whether (1) “kind of potato” effects are equal ; (2) “type 
of store” effects are equal; (3) “city” effects are equal. Write a summary 
statement, (c) Find unbiased estimates of each of the kind of potato 
effects and each of the type of store effects. Establish 95 per cent con- 
fidence intervals for each of the treatment means, (d) The “kinds of 
potatoes” A, B, C, D, and E are baking potato of packer X, all-purpose 
potato of packer X, baking potato of packer Y, all-purpose potato of 
packer Y, and all-purpose potato of packer Z, respectively. Knowing 
this, we wish to test more specific hypotheses than (1) of (b). Use a five 
per cent level test to determine whether (1) baking potatoes bring a sig- 
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tiificamly better pncethanall purpose potatoes. (2)packer«4'and packer Y 
receive the same price on the average , (3) packer Z receives a better price 
Tor all purpose potatoes than packers X and Y on the average (e) The 
store types 1,2 3, 4. and 5 are large chain store downtown, small 
private store downtown, large chain store in residential shopping center, 
large private store in residential shopping center, and small private store 
in residential shopping center, respectively With this added inrormation 
available, stale and rest three hypotheses involving linear contrasts of 
type of store (f) Use Duncan's multiple range test to rank the five 
treatment means (g> Test hypothesis (1) in (b), assuming mat the obser- 
vations in the first row and first column are missing Find a 90 per cent 
confidence interval for (he difTerence in the means of treatments A and 
C (h) Estimate the efikteocy of this experiment relative to the com- 
pletely randomized design Also, estimate the efficiency of this experiment 
relative to the randomized block design if rows are used as blocks, if 
columns are used as blocks Wnte a summary statement 
12 JO In an experiment with only three treatments a single Latin square design 
does not allow enough degrees of freedom for the error mean square 
The following data is for two replications of the same 3x3 Latin 
square design 


54(0 

56(A} 

53(B) 

1 30(0 

60(A) 

53(B) 

52(/0 

47(B) 

46(0 

56(4) 

44(B) 

43(0 

50(fl) 

4UC) 

54(4) 

1 4?(fl) 

40(C) 

S8{4) 


(a) Prepare an analysis of variance table (b) Test the equality of the three 
treatment means using a five per cent level test (c) Establish a 9S per 
cent confidence interval for the ditference in the means of treatments 
A and B 

12 31 Derive the expected mean squares for the mixed model (7) of Table 12 34 

12J2 (a) Construct an experimental layout for a 5 x S Graeco-Latm square 
design (b) Give an analysis of variance table for (a), indicating how 
the sum of squares are computed (c) Write the model equation for the 
general Graeco Latin square design defining all technical terms (d) 
Write the expected mean squares for all experiments in Graeco-Latm 
square designs in which the errors are random 
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AN INTRODUCTION TO FACTORIALS 


Factorials arc described related to designs and analyzed accordin 
the procedures given in earlier chapters on analysis of variance Fact 
experiments are compared to the so called classical experiments Interac 
IS illustrated and defined Different experimental models are introduced 
compared The topics of fractional factorials confounding and split | 
are treated in summary fashion 

131 INTRODUCTION 

With the introduction of each new design emphasis was given to 
refinements which reduce the error mean square In these designs we tho 
of the responses or observations as being affected by only one Ireati 
factor the other factors being for the control of the experimental ei 
However two or more treatment factors may cause variation m the ol 
valions Thus in this chapter we think of variation in observation 
resulting from two or more treatment factors in designs with one or r 
error control factors Experiments or studies with two or more treatr 
factors are abundant and can be found almost anywhere data are col1e< 
Illustrations are found in the effect that 

1 Different temperatures and different fabrics have on the percentag 
shrinkage during dyeing 

2 Time of shift and amount of music have on the amount of absentei 
m a large plant 

3 Nitrogen phosphorous and potassium have on the yield of a c 
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4. Size of an egg, type of diet, and sex of a chick have on its weight at 
ten weeks of age. 

5. Different breakers and different gaugers have on the compressive 
strength of a hardened cement cube. 

6. Concentration of detergent, sodium carbonate, and sodium carboxy- 
methyl cellulose have on the cleaning ability of a solution. 

7. Baking temperature and recipes have on the size of a cake. 

8. Weight and sexes of subjects and rate of stimulus have on the amount 
of physical response. 

We use illustration 1 to introduce some of the terminology associated 
with factorials. Let T denote temperature and F fabric. Then three different 
temperatures may be designated by T,, and four different fabrics by 

F,, Fs, Fa, F^. We refer to F,, F,, Fj as levels of factor F, F, being termed 
the first level. Likewise, F,, F,, F,, F^ are termed the first, second, third, and 
fourth levels of F. The combination of the first level of F and the fourth 
level of F is denoted by F,F^ and called “the treatment combination of F, 
and Fa.” 

If in an experiment there are only three levels of F and four levels of F, 
then there are 12 possible treatment combinations. If each of the 12 possible 
treatment combinations is applied, the experiment is termed a factorial 
experiment, or, for short, a factorial. We also refer to this as a 3 x 4 factorial 
experiment, meaning that there are three levels of the first factor and four 
levels of the second factor in the experiment. (We do not say “factorial 
design,” since the term “factorial” has to do with the combination of treat- 
ment levels in an experiment.) The treatment combinations may be applied 
to experimental units in a completely randomized design, randomized block 
design, Latin square design, and many other designs which have not been 
described. 

13.2. AN APPLICATION OF A 3x2 FACTORIAL 

In Example 13.1 we illustrate these definitions and principles as well as 
the method of analysis for a hypothetical 3x2 factorial experiment in a 
completely randomized design. 

Example 13.1, Three levels of factor A and two levels of factor B are 
fixed in an experiment with 18 experimental units in which each of the six 
treatment combinations A^B^, AiB^, A;Bt, A^Bs, A 3 B,, AsB^ is to be randomly 
assigned to three units. Hypothetical data and notation for such an experi- 
ment are shown in Table 13.1. (The student may assume the data to be coded 
data for any two-factor factorial experiment he chooses to consider.) (a) 
Prepare an analysis of variance table, showing the variation due to factor 
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A, factor B, and interaction of A and B (b) Find estimates of effects and 
test appropriate hypotheses 


TsUe 13 1 

Kypotheticat Data for a 3 x 2 Factoral in a Completely Randomized Design 


Level of factor A 

t 

2 

3 

Level of factor B 

1 2 

1 2 

1 2 

Treatment combination 

A,B, A,B, 

A,B, A,B, 

A,Bi A,B, 

1 

Replication 

1 1 

2 

3 

x,i, — 24 Aiii = 23 
*111 = 29 to 

*111 == 23 *«j = 24 

Xni = t6 *„i = 21 
*1.1 = 11 *w = 21 

*1.1=15 *ta= 18 

*„t = 19 *jji = 24 
*>11 = 16 *,11 = 21 
*111 = 16 xnj = 18 

Toul 

= 78 T„ « 66 

r„ = 42 r„ = 60 

T„ = SI r„ = 63 


Grand total T = 360 


Thinking of the data in Table 13 1 as a one-way classification, the treat- 
ment and total sums of squares are given by 


2 7 

^ T?"' 


and 

JSr= * +18' -^' = 330 

respertivcly The within sum of squares is then found to be 
SSfF = SSr - SSTr « 72 


These sums of squares, along with their corresponding degrees of freedom 
and mean squares, are shown in Table 13 2, an intermediate analysts of 
variance table 


TiUe I3j 

An Imermediate Analysis of Variance for the Data of Table 13 1 


Source of Vanatton 

Sum of 1 
S^uaru 1 

Degree! ef 
Freedom 


Exfiected Mean ^uare 
for Fixed Model* 

Treatment combinations 

Within 

238 ' 

72 j 

5 

12 

51 6 

6 

»* + 3 2 2 O'!! - mWS 

Toul 

1 330 1 

17 




• See Eq (12 10) for definition of aj - I* 
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In order to separate the treatment combination sum of squares into 
component sum of squares for factor A, factor B, and the “interaction of 
factors A and 5,” arrange the totals of Table 13.1 in a two-way classification, 
as shown in Table 13.3 (see Sect. 13.3 for a definition of interaction and 
Sect, 13.4 for a discussion of interaction). Observe that the total sum of 


Table 13.3 

Treatment Combination Totals in a Two-way Classification 


Levels for 

Levels for Factor A 

Totals for 

Factor B 

1 

2 

3 

Factor B 

1 

IQBQI 

1 

II 

mgegm 


2 


1 

7*22 = 60 

HQHgjjjll 


Totals for 
Factor A 

T,.. = 144 

o 

11 

II 

T= 360 


squares for the two-way classification of Table 1 3.3 is the same as the treat- 
ment sum of squares computed from Table 13.1. The sums of squares for 
factors A and B are, respectively 

3 

CO/ _ ? _ 144= + 102= + 114= 360= 

S5A g ^ g TT 

and 

2 

COP _ ? T= _ 17P + 189= 360= 

9 18 9 TT “ 


Table 13.4 

Analysis of Variance and Expected Mean Squares for the Data of Table 13.1 


Source of 
Variation 

Sum of 
Squares 

Degrees 

of 

Freedom 

Mean 

Square 

Com- 

puted 

F 

Expected Mean Squares for 
Fixed Model (a, 0)* 

Treatment 

combinations 

258 

5 



<^^ + 3ii(Mu-M)=/5 

t > 

Factor A 

156 

2 

88 

14.7 

<r2 + 6i;<r?/2 

i 

Factor B 

18 

1 

18 

3.0 

<^=4-9 2 0)n 

Interaction 

AB 

84 

2 

42 

7.0 

+ 3 2 2 0*0 - /'«. - A J + |u)=/2 

Within 

72 

12 

6 


0-2 

Total 

330 

17 





* See Sect. 12.3 for definitions of (n, - /Sj, and mij - Mt. - m.j + f^. 
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Denoting the sum of squares due to the interaction of A and B by SSAB, 
we find, on subtraction, that 

SSAB = SSTr - SSA - SSB ~ 84 

Bringing all these 'ums of squares together, we have the analysis of variance 
shown in Table 13 4 That is, for the complete analysis of variance we first 
obtain the within, or error, sum of squares by making a one way classifica- 
tion analysis, and then we partition the treatment combination sum of 
squares by doing a two-way classification analysis Combining the two analy- 
'ses, we see that the sum of squares identity is 

SST = SSA + SSB + SSAB + SSIK (13 1) 

On comparing Eq (13 1) with Eq (1262), we see that the sums of squares 
for factor A factor fi, interaction of A and B and withm are the same as 
the sums of squares for columns, rows, cxpetimcntal error, and sampling 
error, respectively It is clear from the above discussion that this analysis is 
the same ns that for a two-way classification with the same number of 
observations per cell (see Sect 12 8) But we shall soon sec that the test 
procedures and interpretations are different 

Dividing the celt touts factor A totals, factor B totals, and grand total 
m Table 13 3 by 3. 6, 9 and 18 respectively, gives the means which arc 
shown in Table 13 5 Estimates of effects are found in terms of these means 


Table 135 

Means for a 3 x 2 Faeional wiib Three Replications (Computed from Table 1 3 3) 


Leith for 

1 Levels for Faetor A 

Means /or 

• Factor B 

1 

2 

- 1 

Factor B 

I ! 

^1, - 26 

it, 14 

Jji 17 

^ 1 - 19 

2 

i,- - 22 

Sn 20 

X» = 2t 

ii = 21 

Means for ; 
Factor A 

X, - 24 

S, 17 

i, - 19 

« = 20 


The estimated effects for the different levels of factors A and B are 
a, ^x, - j? = 4 

a, = Xt ~ X = -3 
o, = X, — jf = — 1 
bi = S — X = — 1 

b, = x, - T = 1 

These effects estimate how much the means for levels A„ A„ A„ B„ Bj 
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deviate from the over-all mean. In a somewhat similar way we estimate the 
effect of the cells by subtracting out the sum of the over-all effect (mean), 
the factor A effect and the factor B effect from the cell means. These estimates 
are shown in Table 13.6 and are termed the effects due to the interaction of 
A and B, or, for short, the interaction effects. The notation {ab)^, is used to 
denote the estimated interaction effect of the treatment combination with 
the ith level of factor A and the jth level of factor B after the over-all Ai and 
Bj effects have been removed. [The expression (ab) is to be read as one 


Table 13.6 * 

Estimated Interaction Effects for Data of Table 13.1 


Levels for 

Levels for Factor A 

Factor B 

1 

2 

3 

1 

{ah'),, = 3 

[26-20-4-(-l)] 

(al)).\ = — 2 
[14-20-(-3)-(-l)] 

{ah\, = -1 

[17-20-(~l)-(-l)] 

2 

(« 6 )i== -3 
(22-20-4-1) 

{ab),. = 2 

[20 -20 -(-3)- 1] 

( 06 ) 3 , = 1 

[21 -20 -(-])- I] 


symbol denoting estimated interaction effect and is not to be taken as a 
product.] The reader should notice that the sum of the estimated interaction 
effects in any column or any row is zero. That is 

2(«&)o = 0 (7=1,2) 

• T '(13.2) 

i:(«% = o (/ = 1,2,3) 

Appropriate hypotheses and corresponding test procedures are indicated 
in Table 13.4. The three hypotheses are 

//oi: the true effects for factor A are equal to zero 
■ Ho.: the true effects for factor B are equal to zero (13.3) 

Hoi', the true interaction effects are equal to zero 

To test Htti, find the ratio of the factor A mean square to the within mean 
square and compare with the upper oc level value of the F distribution with 
two and six degrees of freedom. Since the computed F, 14.7, is larger than 
the upper five per cent F value, 5. 14, we reject Ho, and conclude that the true 
effects for factor A are different. Following similar procedures, we fail to 
reject Ho., but we do reject //(,,. That is, we conclude that the estimated 
effects in Table 13.6 deviate too much from zero to attribute them to chance 
fluctuations. 
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We now compare the factonal expenmenl for Example 13 1 with the 
so-called “classical ’ experiment, that is, the experiment in which only one 
factor IS examined, all others being heU constant, or nearly constant In 
the factorial’cxpenment each of three hypotheses were independently tested, 
uyng the same error variance with 12 degrees of freedom In the classical 
experiment we could test hypothesis //#i, using an F distribution with two 
and 12 degrees of freedom, provided each of the three levels of A was repli- 
cated five times, we could test hypothesis using an F distribution with 
one and 12 degrees of freedom, if each of the two levels of B was replicated 
seven times Thus, by making 29 observations we could test the two hypo- 
theses //oi and //« independently with 12 degrees of freedom, just as m the 
factorial experiment Clearly, the factorial experiment is superior to the 
experiment just described on at least two counts, namely, fewer observations 
are required, and interaction of two factors may be estimated and tested 

It IS interesting to note that the factorial experiment is frequently applied 
by investigators who have no special interest in a statistical analysis Then 
interest lies in obtaining a “picture** of a large field of investigation ralher 
than a detailed "picture" of selected topics Thus, by applying a statistical 
analysis on a factonal experiment it would be possible, in addition to getting 
a broad '‘picture," to obtain unbiased estimators of elTecls, to make signifi- 
cant tests involving these efTects, and to establish confidence intervals for 
single means or comparisons in these ways a statistical analysis is likely 
to give much wider applications to expenniental results 

13 3 TWOMaOR FAaORMi EXPERIMENTS IN ONE- WAT 
CUSSlf/CATlON OESJGNS—nxEO MOOft 

Before writing model equation (12 14), that is 

x„ = -t- a, + -f €,, (i = I, .c, /=1, ,r) 

for the two-way classification with one observation per cell, we made the 
assumption that - fi, ~ n , + n = 0 Letting 

(ct/S)o = Mu - - It > + M («=1, .r) (134) 

we now consider the anaij'sis of variance for the case where is not 

zero A complete analysis for such a model usually requires that more than 
one observation per cell be made Actually, the calculations are easiest and 
the information per observation greatest when an equal number of observa- 
tions are made m each ceff 

Thus, we consider the two-factor factonal experiment with n observations 
per treatment combination, that is, the particular two-way classification with 
n observations per cell which has the model equation 
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X{ju = -h (Xi + + 6lj« I — 1> • • • 5 C 

j=\,...,r (13.5) 

M = 1, . . . ,« 

where Xtju denotes the wth random observation for the ith level of factor A 
and the j\h level of factor B, fi denotes a constant over-all mean, oCt denotes 
the effect of the ith level of factor A, /3j denotes the effect of theyth level of 
factor B, (a/3),j denotes the effect of the ith level of factor A and theyth level 
of factor B, and denotes the wth random effect for the ith level of factor 
A and the yth level of factor B. Expression (13.5) has the same form as the 
model equation (12.59) for subsampling in a randomized block design. In 
Eq. (12.59) three of the effects are for the control of the error variance, 
whereas only one is for the description of treatment effects, but in Eq. (13.5) 
only one effect is for control of the error variance, whereas three are for the 
description of treatments. Replacing B,j of Eq. (12.59) by (ff/S)j^ indicates at 
a glance the change in design and emphasis. For a fixed model experiment 
the oci, Bj, and (a/3)(, are assumed to be fixed, so that 

t»l 

■ 2(o^/5)o = 0 y=l,...,r (13.6) 

2(«/9)(j = 0 i=l,...,c 

and the are normally and independently distributed with zero means and 
common variance <r^. 

The sum of squares identity, not depending on the assumptions regarding 
effects, the same as Eq. (12.62), is illustrated in Example 13.1 and indicated 
in Table 13.7. Under the assumptions for the fixed model experiment, 
model [a, B, (a/3)], it is easy to derive the expected mean squares shown in 
Table 13.7. 

The effects of factor A and the effects of factor B are estimated in the 
usual way, and the interaction effect (aBh) is estimated by 


{ab)i} = X{j - Xi. — X } + X 

(13.7a) 

or 


(ab)ii = Xif - X - at - bj 

(13.7b) 

For the analysis of variance of Table 13.7 the three hypotheses usually tested 
are given in (13.3). They may also be stated as 



460 


AN INTRODUCnON TO fACTORIALS 


CHAP 


TsMe 137 

Analysis of Variance for a Fixed Model Two-Facior Factorial Experiment 
in a One Way ClassifkaiKm Design 


Source of 
ioriatton 

Sum of 
iijuarer 

Deermof 1 
FrteAom i 

Mean Sijuares | 

Expeettd Mean Squares 
for Fixed Mode! 

K 8 

Treatment 

combinations 

SSTr 

er-l 

A - SSTr 1 

cr~\ 1 

2 2 <>*0 “ •“)* 
'' + "^-V^rT 

Factor M 

SSA 

c- 1 

SSA 

±<-1 

Factor B 

SSB 

r- 1 

^ _ SSB 

^ ^ fi) 

Interaction 

SSAB^SSTr 

(c-iyr-I) 

SSAB 



• (c- IKr- 1) 

- IXr- 1) 

Within 

(error) 

= SST 
-SSTr 

cAn - 1) 

* ftla - T) 

r* 

Toul 

|J3T 

I""-' 

1 

1 


where 





vr, _ , , , 

SSB - -L - — i J'TT- - i 

To T. r, » 


n 

ern 




/ //« 0, = = dr ~ 0 

I Ho. /S, = = = 0 

1h„ (aSh,^ =(01?), = O 


(rt/S),; *=0 (i = I. 

1. 


(13 8) 


.r) 


To test //oi. Compute the ratio aj/g, and compare wiih the upper a level value 
of the F distribution with c — 1 and er{n — 1) degrees of freedom To test 
Hff, compare the ratio j’/^* with the upper a level value of the F distribution 
withr- 1 and fr(n — I) degrees of fr^om To test Hoj compare the ratio 
5i/jJ with the upper a level value of the F distribution with (c — l)(r — 1) 
and cr{n - I) degrees of freedom In a fixed model experiment all three 
hypotheses m (13 8) arc usually t«ted at the same significance level a Since 
the tests are independent, the over'all significance level for the experiment 
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is then 1 - (1 - a)^; that is, the probability that at least one of the hypo- 
theses will be falsely rejected is 1 — (1 — a)^ 

It is possible to test other hypotheses in a factorial experiment. The 
null hypothesis that all the treatment combination means are equal, that is, 
li(i = = 1, . . . , c; y = 1 , . . . , r), can be tested by comparing sl/sl with 

the upper a level of the F distribution with cr - 1 and cr{n - 1) degrees of 
freedom. However, such a test is likely to be of no interest. Tests of linear 
comparisons are likely to be of much more interest. Usually, the linear 
comparisons involve the means for the levels of only one factor. The test 
procedure is illustrated in the following example. 

Example 13 . 2 . Use the data of Example 13.1 to test the following two 
hypotheses about linear comparisons among the three levels of factor A 

y, = 2fiu - ^2 - /ij =0 

y^ — Ms =0 

Using the totals of Table 1 3.3, we find that components of the treatment 
sum of squares are 


and 



(J^ 

rn 2 w? 


[2(144) - 1(102) - 1(114)]2 
6[2^ + (-1)“ + (-1)^] 


144 


ps, [1(102)- 1(1 14)T 
6[U + (-l)^] 


12 


The error variance is s'? = 6 with 12 degrees of freedom. Thus, the computed 
F ratios are 


^ = 24 and ^ = 2 

Since the upper five per cent value of the F distribution with one and 12 
degrees of freedom is Fo^l, 12) = 4.75, we reject i/o, and fail to reject 
That is, we conclude that the true mean response for the first level of factor 
A is greater than the mean of the other two levels, and the true mean responses 
at the second and third levels are not different. 

Duncan’s multiple test procedure may also be applied in ranking the 
means for the levels of factor A or in ranking the means for the levels of factor 
B. Confidence intervals for means and comparisons may be obtained by the 
methods already described. Power of the tests may also he found by the 
usual methods. 
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J3 4 JNTfRACTfON »N A FACTORfAl EXPfR/MfNT 

Since interaction is the source of variation that makes factonal expen- 
mcnts different from those already descnbed, we now give special considera- 
tion to this very important concept The term “interaction” has already been 
used several times as we referred to “intcniction effects,” “the interaction of 
factors A and 5,” “the sum of squares for the interaction of A and ff," and 
“the test for interaction ” It is clear that the interaction effect of the ith level 
of factor A and thcjth level offactor Bis defined by Eq (13 4), that is, (aff),j 
IS defined by 

(afi)ij - fit, - fiL - f 4- (I 

Expressed m terms of factor A and factor B effects, may be written as 
= It,,- ui - B, - II (13 9) 

It IS understood that 4 is an effect (mean) common to all cr populations, 
that a, IS an effect common to all levels of factor B, that B, is an effect 
common to all levels of factor A, and (aB)ti on cl^cct due to the combina- 
tion of the ith level of A and the yth level of B, that is, a cell effect But inter- 
action may be explained m other perhaps simpler, ways 

Consider a 2 x 3 factorial There are six populations with means 
Hi, (1 a 1, 2,y s 1 , 2, 3) and common variance a* If interaction does not 
exist, |i,j = #1 -t- a, + iSj. if It does exist, then ji„ => n a, + B, + (aff)n 
Two-way tables of population means of treatment combinations, expressed 
in terms of effects, are given in Table 13 8 If the means m the A, column 
are subtracted term for term from (he means m the A, column, observe that 
the differences are always the same for the case of no interaction, but that 
the differences are not the same for the case of interaction Further, if the 
means in the B,(B,) row are subtracted term for term from the means in the 
B,(B,) row, a similar thing happens 


Table 138 

Population Means m Tenns of Effects 



Ao Inferaeiicn 

Interaelion 


•<1 

A, 

A, 

At 

£1 

+ aj + 

,1 + a, -t a, 

a, + (<«i9)ii 

«-<», + 18, + 


>. -t- 0, + «, 

,1 + a, -f a. 

A + «, + + (»«)„ 

p + 0, + fl, + 

B, 


/» + «!+ flj 

A + «| + + («B)ii 

M -t- 


As an example, consider the particular means of two 2x3 factorials 
shown m Table 13 9 By taking the differences between responses at varying 
levels of A(B) for each level of B(AX we find that the pattern of differences 



SECT. 13.4. 


AN INTRODUCTION TO FACTORIALS 


463 


is the same for each level of B(A) in Table 13.9a. In fact, at each level of B 
the difference in responses at the two levels of is 2 and at each level of B 
the differences are I and 4. That is, there is no interaction between factors 
A and B in Table 13 9a. In Table 13.9b the pattern of differences at each level 
of B{A) changes. That is, at levels 5, and B^ the differences are 2, but at level 
Bi the difference is -1. Further, at level A^ the differences are 1 and 4 and 
at level A^ the differences are 1 and 1. Thus, there is interaction between 
factors A and B in Table 13.9b. 


Table 13.9 

Population Means of Responses for Treatment Combinations in a 2 x 3 Factorial 


(a) 


No Interaction 

Factor B 

means 


1 Ai 

4s 

■■ 

■9 

14 

13 


■9 

15 

14 

HH 

17 

19 

18 

Factor A 
means 

14 

16 

15 


(b) 


Interaction 

Factor B 

1 

A\ 

A 2 

means 


12 

14 

■an 


13 

15 


B, 

17 

16 

■■ 

Factor A 
means 

' 14 

1 

15 

14.5 


Interaction may also be illustrated graphically by plotting the means jXj. 
for the levels of factor A (means (i., for the levels of factor B) as abscissas 
and the several corresponding cell means for each level of factor A (for 
each level of factor jS) as ordinates and connecting the points for each level 
of factor B (factor A) by straight-line segments. If the resulting broken-line 
polygons are parallel, there is no interaction; otherwise, there is interaction. 
Two graphs for each of the 2x3 factorials given in Table 13.9 are shown in 
Fig. 13.1. When there is no interaction, the broken-line polygon is actually 
a straight-line segment with slope of one. Both a and b of Fig. 13.1 illustrate 
this point. Further, c and d of Fig. 13.1 indicate the presence of interaction. 
In most cases the investigator has a choice as to which of two graphs to 
construct in illustrating the presence or absence of interaction. [It is left as 
an exercise for the reader to determine what to do in cases where two or more 
of the means p,. (fij are the same.] After trends are discussed in Chap. 14, 
we give another geometric representation of interaction. 

Thus, in every way we have looked at interaction, we see that interaction 
of A and B exists if and only if the magnitude of the difference in response 
in changing from one level of A(B) to another depends on the level of B(A) 
at which the difference is determined. This means that two factors combine 
to produce an effect not due to either one of them alone. 

In working with sample means x,j, x,„, and x-^., constructions similar 
to those in Fig. 13.1 can be made, but, since these means are random 
variables, the absence of parallel lines no longer necessarily indicates the 
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Fig 13 1 Graphs lllustraimg Absence and Iresence of Interaction 

presence of interaction Actually when there is no interaction the lines may 
be far from parallel paniculariy when the samples are small The apparent 
interaction is declared significant when it is too large to explain on the basis 
of chance To illustrate, Fig 13 2 shows the significant interaction of Example 
13 1 



Fig 13 2 Intwactioii of A and B in Example 13 1 
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5 . EXTENSIONS OF THE FIXED-MODEL FACTORIAL EXPERIMENT 

Two extensions of the factorial are considered. First, we discuss the 
o-factor factorial in the randomized block design. Then we give the 
alysis for factorials with three or more factors in either the completely 
ndomized, randomized block, or Latin square design. 

For the two-factor factorial in a randomized block design, the model 
uation is 

X{jk = II + oCi -F + Pk + etjk j = 1, . . . , e 

./=1 r (13.10) 

= 1, . . . , 6 

here ii, ai, /9j, and e^k are defined as in Sect. 13.3, and pk denotes 

le effect of the /:th block. We define the effects such that 

2 = 2 = 2 P-t = 0 

‘ ^ (13.11) 

2 (^/3h = 2 0 

< j 

nd, for the analysis, assume that the etjk are normally and independently 
istributed with mean zero and common variance a-^. It is to be understood 
lat each of the cr treatment combinations is randomly assigned to the cr 
tperimental units in each block. 

From these statements it follows that the sum of squares for factor A, 
ictor B, interaction of A and B, and total are computed as in Table 13.7, 
nd that the block and error sum of squares are given by 

J'2 

■* 'r** 

SSBl = — 

cr crn 

nd 

SSE = SST - SSTr - SSBl 

espectively. That is, the within sum of squares, 551F, in Table 13.7 is parti- 
ioned so that 

SSW = SSBl SSE (13.14) 

t is easy to derive the expected mean squares shown in Table 13.10, aind to 
ee that the estimation and test procedures are similar to those for a fixed- 
nodel factorial experiment in a completely randomized design. Since the 
:rror sum of squares is obtained by taking the block sum of squares out of 
he within sum of squares, the number of degrees of freedom for the error 
variance is reduced from cr(6 — 1) to (cr — 1)(5 — 1). 


(13.12) 

(13.13) 
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Table 1310 

Analysis of Variance for a Fixed Model Two-Factor Factorial Experiment m a 
Randomized Block Design 


Source of 
Variation 

Stuarts 

Desrets of 
Freedom 

r 

1 Mean Stiaares 

Expected Mean Sauaret 
for Fixed Aftxiff 
(«fl) f] 

Block 

(replicates) 

Factor A 

Factor B 

Interaction 

Error 

1 SSB! 

\ 

SSA 

SSB 

SSAB 

SSE 

1 f> - 1 

c - I 

T- \ 

(c-iX'-l) 

(er-lXh-l) 

. t SSBt 

*»“r=7 

a{ = -^ 

. SSB 

. SSAB ^ 

U - IXr - ”J) 1 

2“’ 

. + c6^ ' 

ef 

Total 

SST 

erb- \ 




A two-factor factorial expcnment in a l.atin square desifn has restricted 
application due to the fact that the number of treatment combinations, cr, 
must equal the sue of the square, b Certain factorials, particularly the 2x3, 
2 X 4, 2 X 5, 2 X 6, and 3x4 factorials, may be used to advantage in 
some instances The model equation, sum of squares identity, analysts of 
variance, and expected mean squares are easy to write as simple extensions 
of the arguments for the two factor factorial in the completely randomized 
and randomized block designs 

Extensions of the factorial to three or more factors is straightforward 
We discuss the case for three factors with each treatment combination rep- 
licated an equal number of times in a completely randomized design (The 
analysis of (factorials with more than two factors jn a randomized block 
design or Latin square design is left as an exercise for the reader ) The model 
equation is given by 

xtj» = + a,+fi, + yf + (aj8)„ + (tryX* 4- 

1=1, .c y=l. .r *=1, ,/ k=l, ,n (1315) 

where the terms Xam, n, o„ 0,, (a0)i„ and ttux are hke those given in Sect 
13 3 Also, 7» denotes the effect of the kth level of the factor C, (a 7 )n denotes 
the effect of the one way interaction of the ith level of A and the Ath level 
of C, {8y) t denotes the effect of the one-way interaction of the jth level of 
B and ihe Ath level of C, and (afiy), * denotes the effect of the two-way 
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interaction of the ith level of A, jth level of B, and /cth level of C. These 
effects are to have the restrictions given in Eq. (13.6) along with other similar 
restrictions required because of the additional factor C. 

The sum of squares identity has nine terms. In order to give computing 
formulas for these sums of squares, first let 


V 

Ty.. — 22 ^U/fcu 

k tL 

“22 

J « 

T.jk. = 22 

i « 

- 2 2 2 
j k u 

= 222 

i k u 

T.k. = 222 

1 J u 

^=2222 Xiiku 
i } k u 

rpZ 

CT^-L^ 

crln 



The necessary sums of squares are given by 


2 22 2^1*.- cr 

X i k u 

(13.17) 

222713. 

SSTr(ABC) - -! — i CT 

n 

(13.18) 

2 2n. 

SSTr(AB) - ' \ CT 

(13.19) 

2 2 71... 

SSTt^AC) - ‘ * CT 

rn 

(13.20) 

2 27-a. 

SSTr(BC)- ^ *• CT 

cn 

(13.21) 

NT 7^ 

>'SA = -1-= CT 

rln 

(13.22) 

27’.V. 

SSB- i CT 

an 

(13.23) 

2-7^.*. 

SSC - * CT 

crn 

(13.24) 

SSAB = SSTr(AB) - SSA - SSB 

(13.25) 
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5Sy4C=tS5rr(i4C)-5Si<-SSC (13 26) 

SSBC = SSTr[BC) - SSB - SSC (13 2?) 

SSABC = SSTr{ABC) - SSA - SSB - SSC . , 

- SS/4B - SSi4C - SSBC ' ’ 

SSE = SST - 5Srf(«4BC) (13 29) 

The analysis of variance and expected mean squares are shown in Table 13 11 


Table U tt 

Analysis of Variance for a Fixed Model Three Factor Factorial Experiment in a 
Compteteiy Randomiaed Design 


Source of 
Verialion 


. Drjrees of Frrethm 

Souores 

Expected Mean Square 
for Fixed Model 

Factor 4 

SSA 

'c 1 

4 

2** 
ft + rln ^ ^ j 

Factor B 

1 SSB 

/ - t 

4 

»» + eh 

Factor C 

SSC 

1 - 1 

4 


Interaciion 
^ X ^ 1 

SSAB 

(c- IKr- 1 ) 

4 


Imeraction 

A X C 

1 

SSAC 1 

(e - IH/- 1) 

4 

2 2 ('•»)’. 
^Vc-TifT^T) 

Imeraction 

BxC 

SSBC ' 

(r- IKf- t) 

4 

22(8>)It 

Inieracnon ^ 

Within 

error 

SSABC 

SSE 

(c- (Xr- tHl- U 

erl(n - 1) 1 

4 

4 

222 

' "(e - IKr - IK/- 1) 

Total 

SST 

crin — 1 




For a three-faclor faciortal experiment «n a randomired block design, the 
block sum of squares (degrees of freedom) is taken out of the within sum of 
squares (degrees of freedom) of Table 13 1 1, all other sums of squares (degrees 
of freedom) remaining the same For a factorial m a Latin square design, 
the row and column sums of squares are taken out of the within sum of 
squares of Table 13 11 If there are more than three factors in cither a com- 
pletely randomized, randomized block, or Latip square design, the methods 
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already given can be used to obtain the analysis of variance. In order to 
illustrate a good computational technique for the analysis of a four-factor 
factorial in a randomized block design, we give Example 13.3. 

Example 13.3. The primary objective of this experiment was to gain 
quantitative information about the ability of various concrete mixes to 
protect reinforcing steel against corrosion over long periods of time. The 
electrical-resistance method was used to evaluate the protection given the 
reinforcing steel by the surrounding concrete. In this method a thin steel 
ribbon was cast in the thin concrete test sections to simulate the reinforcing 
steel. When corrosion of the steel occurred, the cross-sectional area of the 
steel ribbon decreased, and the electrical resistance of the ribbon increased. 
Hence, by periodic measurements of the electrical resistance of the ribbon, 
the progress of the corrosion was recorded and plotted. 

Test panels were 2.75 in. wide and 12 in. long. Their thickness T was 
either 0.750, 1.125, or 1.500 in. A steel ribbon 0.008 by 0.250 by 10 in. was 
soldered to two 0.25 in. square copper bars, each 2 in. long. These copper 
bars served as electrode terminals and protruded about one inch from the 
ends of the concrete test panel. Prior to embedment in the fresh concrete, 
the ribbon assembly was cleaned with a solvent, and the soldered joint and 
terminal were covered with three coats of Seal-Glo. The solvent was applied 
to de-oil the ribbon, and the Seal-Glo was applied to prevent electrolytic 
action between the copper, the solder, and the steel. The ribbon was centered 
in the panel with its width parallel to the panel depth. The concrete used in 
molding the test panels was made from six or seven sacks 5 of the same type 
of cement, 6 or 6.5 gallons G of water per sack of cement, and tap water 
fV, or sea water fVr having a specific gravity of 1.028. The sand and 
limestone were the same for all test panels. The experiment was replicated 
three times. 

The resistance /? at any time divided by the initial resistance jRg was taken 
as the measure of corrosion p. A graph of In p against time (number of cycles 
under the accelerated test conditions) usually plotted as a straight line. Its 
“least squares” slope, b, measured the relative protection given the reinforcing, 
large numerical values of b indicating shorter life for the reinforcing steel. 

The above is a 3 x 2 x 2 x 2, or 3 x 2\ factorial experiment replicated 
three times in a randomized block design. The slopes, b, of the regression 
lines multiplied by 10^ are given in Table 13.12 for the 72 test specimens. 

The replicate (block) totals are 104,901, 88,760, and 1 12,572, respectively. 
Other totals required for the analysis of variance are shown in Tables 13.13 
through 13.23. The border totals are useful for checking purposes, and those 
m the two-way tables are required for the sums of squares of factors G 
W, S, and T. 



470 AN INTTlODOCnON TO TACTORIAli CHAP 11 


TaUe 13 11 

Slope of Regression Line Times 10* for » 3 x 2 x 2 x 2 Factorial Experiment in a 
Randomized Block Design* (ProteciKin of Conmte Mixes against Corrosion of Steel) 
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yy. 

M'l 
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28 

20.240 


■Si 
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1,233 
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-705 

5,808 

* 


T, 

13 

2.30S 

-19 

13,310 


■Ss 

7i 

641 

3.026 

27 
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62 
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■plririfl 

289 
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85 

13,090 


■I 
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7,951 



T, 

-1.001 

4.374 

17 

5.04J 


s, 

Tt 

55 

1.903 
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9,772 



T, 

37 

21,620 

52 

14.300 


• Source Applied Mechanics Department. Virginia Polytechnic Institute, 1956 


Table 1313 

Four-way (C x IF x 5 x 7) Table Obtained by Summing over Replicates 
to Table 13 12 


1 

I I 

1 1 

Tala/j 

IK, 

M's 




1 

999 

13.317 

203 

42,341 

59,062 


7i 

448 

40.760 

468 

27,906 

69.582 


r, 

540 

30,397 

-307 

15,065 

45,695 


Ti 

-940 

10,906 

38 

22,293 

32,317 


r, 

-6,435 

8.S9S 

113 

16812 

19,088 


T, 

-2,962 

51520 

214 

31,317 

80,489 

Totals 

-8,350 

137598 

731 

133,934 

306533 
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Table 13.14 

Three-way (W^ x 5 x T) Table Obtained by Summing over Factor C 
in Table 13.13 



IF, 


Totals 



1,204 

57,858 

59,062 



1 Ti 

1 916 

68,666 

69,582 



n 

233 

45,462 

45,692 


■■■ 

T, 

-882 

33,199 

32,317 




-6,322 

25,410 

19,088 


■H 


-2,748 

83,237 

80,489 


Totals 

-7,599 

313,832 

306,233 


Table 13.15 

Three-way (G x S x T) Table Obtained by Summing over Factor fV 
in Table 13.13 



Gx 

Gi 

Totals 


■ 

Tx 

16,316 

42,746 

59,062 

5. 



41,208 

28,374 

69,582 


■ 

■bb 

30,937 

14,758 

45,695 


1 

T, 

9,966 

22,351 

32,317 

5j 


Tr 

2,163 

16,925 

19,088 



T, 

48,958 

31,531 

80,489 

Totals 

149,548 

156,685 

306,233 


Table 13.16 

Three-way (G x W x T) Table Obtained by Summing over Factor 5 
in Table 13.13 





Totals 

Wx 



fV, 

Tx 

59 

26,223 

263 

64,834 

91,379 


-5,987 

49,358 

581 

44,718 

88,670 

T, 

-2,422 

82,317 

-93 

46,382 

126,184 

Totals 



751 

155,934 

306,233 
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Tabic 13 17 

Three-way (G y W y. S) Table Obu ned by Summ ng over Factor T 
01 Table 13 13 


S 

Si 

Totals 


t*’, 

I 9S7 
-10 337 
-8 3JO 


IF* 


O' 


86474 I 366 

71 424 I 38S 


137 898 


731 


f*'t 
SS5I2 
70422 
ISS 934 


Totals 


174 339 
131 894 
306^33 


Table 13 18 

C X T Table from Table 13 16 



C 

Ci 

Totals 

T 

36 782 

65097 1 

91 379 

Ti 

43 371 

43^99 

88 670 

Ti 

79 893 

46 289 ' 

126184 

■Bl 

149 348 

136 683 1 

306 2)3 


Table 13 30 


S'. 

< TTibI« from Table 13 13 


S 

St 

Totals 

T 

39 062 

32 317 

91 379 

T, 

69 582 

19 088 

88670 

Tj ' 

43 693 

80 789 

126184 

Totals 

174 339 

131 894 ' 

306 23) 


Table 13 22 


W : 

y S Table from Tab!* 

13 14 



M'l 

1 Totals 

Si 

2J33 

171 986 

174339 

Si 

-9 952 

141 846 

131894 

Totals 

' -7599 

3IJ 832 

306 233 


Table 13 19 

ty y TTable from Table 13 14 



W 

1*5 

Totals 

T 

322 

91 057 ; 

91 379 

T, • 

-3406 

94 076 

88 670 

T, 

-2313 

128 699 

126184 

Touh 1 

-7 599 

313 832 


C > 

Table 1331 

c H' Table from Table 13 17 

1 

C 

c, 

Totals 


8 350 

751 

-7,599 


137 898 

135 934 

3U 832 

Totals j 

149 348 

156,683 , 

306433 

G y 

Table 13^ 

5 Table from Table 13 IS 


C 

Gi 

1 Totals 

S 

88 461 

83 878 

, 174 339 

Si 

61087 

70 807 

131 894 

Totals 

149 346 

156 685 

306 233 


Once the sum of squares of factors C W S and Tare found thesixiwo- 
factor interaction sums of squares are computed in the usual way from 
Tables 13 18 through 13 23 Then the four three factor interaction sums of 
squares arc found by using formulas like Eq (13 28) That is any three factor 
interaction sum of squares is obtained by subtracting from the three factor 
total sum of squares three single factor sums of squares and three two factor 
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interaction sums of squares. The four-factor sum of squares is obtained in 
an analogous manner by subtracting from the four-factor total sum of squares 
the four single-factor, the six two-factor, and the four three-factor sums of 
squares. This rule can be extended to any number of factors. The replicate, 
total, and error sums of squares are computed in the usual way. Table 13.24 
shows the resulting analysis of variance. 


Table 13.24 

Analysis of Variance for the Fixed-Effects Four-Factor Factorial Experiment in 
the Randomized Block Design of Example 13.3 


Source of 
Variation 

Sum of Squares 

Degrees of 
Freedom 

Mean Square 

Calculated 

F 

Replication 

12,310,937 

2 

6,155,469 

1 

G (gal/sack) 

707,455 

1 

707,455 

1 

W (type water) 

1,434,970,666 

1 

1,434,970,666 

111.49 

S (sacks/yd’) 

25,021,917 

1 

25,021,917 


T (thickness) 

36,472.596 

2 

18,236,298 


GxT 

109,279,406 

2 

1 54,639,703 

4.25 

WxT 

37,805,268 

2 

1 18,902,634 


SxT ' 

161,459,812 

2 

80,729,906 

6.27 

GxW 

1,700,475 

1 

1,700,475 


WxS 

4,417,878 

1 

4,417,878 


GxS 

2,102,275 

1 

2,102,275 


WxSxT 

140,077,459 

2 

70,038,730 

5.44 

G X S X T 

37,912,665 

2 

18,956,333 


GxWxT 

126,001,634 

2 i 

63,000,817 

4.89 

GxWxS 

2,129,704 

1 ! 

2,129,704 


G X fVx Sx T 

23,227,457 

2 

11,613,719 


Error 

592,187,398 

46 

12,871,030 


Total 

2,735,474,065 

71 




Since all treatment and interaction effects are fixed in this experiment, 
each hypothesis is tested by comparing the appropriate mean square with 
the error mean square. Those F ratios which are significant at the five per 
cent level are shown in Table 13.24. (Actually, since F„5(2,46) = 3.13, 
T.oi(2, 46) = 4.92, and ^’.oi(l» 46) = 7.00, we see that three of the five 
effects which are significant at the five per cent level are also significant at 
the one per cent level, and a fourth, the three-way interaction of T, G, and 
W is almost significant at the one per cent level.) The four significant inter- 
action effects all involve thickness of concrete panels, but the thickness effects 
are not significant. The two-factor interaction effects can be interpreted by 
by the methods of Sect. 13.4. In studying the three-factor interaction 
T X S X W, say, it should be noted that three factors in combination work 
in such a way that the average response at certain combinations of levels is 
significantly different from the average response at other combinations of 
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levels For example, levels Wi, St, and T, jn combination give the best 
response, that is, concrete panels 1 125 tn thick made with tap water, 7 
sacks of cement per cubic yard, and either 6 or 6 S gal of water give the 
least corrosion to embedded steel (For a better understanding of the inter 
actions m this experiment the interested reader can study Tables 13 14, 13 16, 
13 IS, and 13 20 in some detail) It is obvious, on looking at Table 13 13, 
that tap water is better than salt water for all treatment combinations 
considered Since this is a fixed-level experiment, statistical statements can 
be made only about the levels included in the experiment 

The above computational procedure was given on account of the many 
checks, and because the data can be studied as the tables of totals are 
constructed m preparation for the analysis of variance Then, alter certain 
effects are declared significant, these same tables of totals may be used m 
a detailed examination of dilTerences in the experiment Further, these tables 
are useful in finding estimates of effects and »n presenting final conclusions 
resulting from the experiment 

Special important cases of factorials, such as 2* and 3* factorials, may 
be analyzed by shorter, more appropriate techniques Yates [56. 57} has 
presented a special notation and a special method for computing mean 
squares and interpreting results, and others [5. 12, 15 19, 22, 23, 30, 46] 
give important details for analysis of certain regular factorials The interested 
reader can consult these and other reference exercises in this 

'"tow a , 


chapter for an understanding of the a recn - 


. 313 8J2 


^The response to a quantitative factof ~''‘oesr^ 
the vjTm^s 1cvc\ If .'/ns is fb^eJse*^ 

.mo occoSr. in ibe analysis s!\> r.; 51“-«a .J ‘Ki, 305 2M 

could be specified m advance of th?v me * 

presented so far are incomplete Chap e 

return to the problem of trends in anafyiftfi. sfi 


study Refs (f4,4l. 54] for further mfortrts or , 
tative factors and interactions >\ 


13 <5 RANDOM EFFECTS AND A«XEO EFFECTS TVVO-FA< 
EXPERIMENTS IN A ONE-WAY CWSSIFICATION 



The model equation and sum of squares identity for t^e random and 
mixed-effects two-factor factorials are the same as those given in Sect 13 3 
Abe ifrcnAfi’ But expected nKset fqcrercy, sftd, {hits, tenv 

of hypotheses and estimation of variance components depend on the assump- 
tions regarding the effects 

For the random-model facional experiment with n observations per 
treatment combination, we assume that the effects cti and B) are inde- 
pendently selected from normal populations with zero means and variances 
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al and respectively. The interaction effects (ciff)a are also assumed to 
be independently and normally distributed with mean zero and variance cri^. 
The restrictions in Eq. (13.6) do not hold for the random model. 

Two mixed models are possible, but, since they are symmetric in A and 
£, we consider only one. Assume that the ai are fixed so that 


[ 2 = 0 

i 

2 ® J= 

' t 


(13.30) 


Note that we do not assume that 


2 = 0 i = 1, . . . , c 


The are assumed to be randomly and normally distributed with mean zero 
and varainceer^. Since the levels of A are fixed and the levels of B are 
randomly selected, the effects are assumed to be independently and 

normally distributed with mean zero and variance crl^. All random effects 
in all models are independently distributed. 

Under the assumptions given above, the analysis of variance and expected 
mean squares shown in Table 13.25 along with the distribution of variance 
ratios are easy to derive. As in other cases, the expected mean squares can 
be used as guides in estimating variance components, testing hypotheses, 
establishing confidence limits, and finding power for the various tests. 

The test procedures for the random and mixed models are clearly indi- 
cated by the expected mean squares. Nevertheless, it is important to note 
that for the mixed model the mean square for fixed effects Ui is compared 
with the interaction mean square, whereas the mean square for the random 
effects is compared with the within mean square. 


Table 13.25 

Analysis of Variance and Expected Mean Squares for Random- and Mixed-Model 
Two-factor Factorial Experiments in a One-way Classification 


Source of 
Variation 


Factor A 


Factor B 


Interaction 

Within 

(error) 


Mean Squares 



Expected Mean Squares for 


Random Model 


+ wig + rne^ 


Mixed Model {a) 




fl -f rn - 


+ nar ^0 + cn(T% 0-2 ^ ^^(^2 
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J3 7 EXTENSIONS OP THE RANDOM AND MIXED MODEl PACTORIAl 
EXPERIMENT 

We consider in turn the analysis of variance for three factor factorial 
experiments in a completely randomized design in which three factors are 
random, two factors are random, and one factor is random The notation, 
model equation (13 IS), and sum of squares identity are the same as in 
Sect 13 5 

For the random model assume the effects ofi, 0„ y,. (QrEi),^, (a7)itt 

rij» To be independently and randomly distributed with means 
zero and variances oj, a\, o?, aia, ai,, and tr*. respectively The 

analysis of variance and expected mean squares are shown in Table 13 26 
From the expected mean squares of Table 13 26 it is clear that there are exact 
procedures for testing all iwoTactor and three-factor interaction variance 
components However, exact tests for the hypotheses 
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test (see Sect. 11.4) due to Satterthwaite [43] may be applied. In the approxi- 
mate test of /foi, compute 



(13.32) 


and compare with the upper <x level value of the F distribution with 
V, = c — 1 and v, = v degrees of freedom where 


V = 


(si + Ts — •rf)^ 


(s\r 


(c - l)(r - 1) (c - !)(/ - 1) (c - l)(r - 1)(/ - T) 


(^l)= 


{sir 


(13.33) 


Note that when the variance components d'ig, a-'iy, and are 

replaced by their estimates s-, sigy, s'ig, siy, and respectively, the expres- 
sion in (13.32) may be written as 


or 


+ ns'jgy + Insig -h rnsjy + r(nsi 

(s^ + ns'igy + Inslg) + (s^ + nslgy + rnsly) — (s^ + nslgy) 


(s- + Nsjgy 4- /nsjg -f rnsly) + r/nsj 
s^ + ns'igy + /ns'ig + rnsly 


The hypotheses //os and //oj may be tested by a similar method. 

For the analysis of the mixed-model three-factor factorial experiment, 


Table 13.27 

Expected Mean Squares for Two Mixed-Model Three-Factor Factorial Experiments 
in a Completely Randomized Design 


Source of 
Variation 


Factor A 

Factor B 
Factor C 

A X B 

A X C 

B X C 

A X B y C 

Within 

error 


Mean 

Squares 


^7 


Expected Mean Squares for the 


Mixed Model (a) 


+ n^ler + + ftf^ly 

-f- rln ^- l - 
c — I 


<r- -t- cnaly + clncrl 
cr- -t- cntTgy + crnffl 


+ + ItU^lu 

+ + rnaly 


ff- + C/Iffu 


tly 


Mixed Model («,B) 


-f rn(rl + rln — — - 
^ c — 1 


"/3r 
-t- crnir^ 


2-3] 
T 


2 2M)?y 


a- -f rn(T[ 
(T“ -I- cnf 


Uy 


<r- + tvr. 


atiy 
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consider (1) the case where the effects oTi are fixed and all others are random, 
and (2) the case where the effects and a, arc fixed, the effects {o/9)„ also 
being fixed, and all others are random The usual assumptions regarding the 
independence and normality of random components are made, and the re- 
sulting expected mean squares are shown in Table 13 27 On close examina- 
tion of these expected mean squares, it is observed that exact tests exist for 
all hypotheses except 


ax- «=or, = 0 (13 34) 

m the mixed-model (or) experiment An approximate test of (13 34) can be 
made by computing (13 32) and comparing it with the upper or level value 
of the /'distribution with v, = c — 1 and k, = ? degrees of freedom, where 
V IS defined by Eq (13 33) That is, the test of (13 34) is exactly like the test 
of the hypothesis that oi = 0 

The analysis of variance and expected mean squares for factorials with 
more than three factors follows the same general pattern already discussed 
Also, subsampling m completely randomized (or randomized block or Latin 
square) designs in which the treatments arc combinations of levels of factors 
leads to the same type of analysis given m Sects 1 1 4 and 12 7 


13 6 SUMMARY RgMARKS ON /ACTOINAlS 


Methods for partitioning sums of squares, ranking factor means, testing 
hypotheses with a single degree of freedom and establishing confidence 
intervals for levels of a single factor in higher-order factorials are similar to 
those already explained for one-, two-, and three-factor experiments Fixed 
effects and components of variance may be estimated with the aid of tables 
of means and 'nected mean squares Power of the various tests may be 
determined alo-^he lines given in earlier chapters 

The questioi^^f pooling certain mean squares to increase the number 
of degrees of freedom of the error mean square offen arises in analysis of 
variance For example, on looking at the two models in Table 13 25, one 
might argue that failure to reject the hypothesis that vlg = 0 is evidence 
that aia is so near zero that the mean squares s| and both estimate the 
error variance and, thus, should be pooled to obtain an estimate of error 
variance with more degrees of freedom That is, one might propose 


,1 _ (c - l)(r - I)j1 + erfn - Dff 
*•" (e - f/fr - r/ -I- en'n - () 


(13 35) 


as an estimator of «r’ with cr(n — 1) -h (c — l)(r — 1) degrees of freedom 
Thus, the hypothesis for factor A could be tested by comparing si with the 
residual mean square with more degrees of freedom, making the test more 
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powerful. Such a procedure is not generally recommended, but there is evi- 
dence [2, 3, 4, 10, 27, 37, 38] to indicate that such pooling is justified in 
certain cases. 

Suppose we consider the test of oig = 0 a “preliminary” test and the 
test of either oi = 0 or 2 «? = 0 the “final” test, and suppose an a level 
final test is required. Then, in no case should an a level preliminary test be 
applied. Instead, the level for the preliminary test should be much greater. 
For example, for a five per cent level final test the preliminary test should 
be in the neighborhood of 50 per cent. The rationale for this is reasonably 
clear. If one fails to reject the hypothesis that = 0 at such a high level 
as 50 per cent, there is a good chance that is very near zero; with a lower 
significance level, one is not as sure that ar\^ is near zero. Examples of the 
pooling procedures are given by Pauli [37, 38] and Bozivich, Bancroft, 
Hartley, and Huntsberger [10]. 

It the above treatment of factorials we were concerned with presenting 
the model, analysis, and certain applications; we did not discuss such 
problems as the choice of levels, the selection of treatment combinations, 
giving the largest (or smallest or specified) response, or the design necessary 
to get a “picture” of the total “response surface” over the region of interest. 
The choice of factors and levels of factors depends on the objectives and the 
cost of the experiment. These topics are discussed in Refs. [14, 48]. The 
factorial experiment may be preliminary to the main objective. For example, 
the factorial may be used to estimate the particular combinations of levels 
of factors which give the maximum (or minimum or specified) response, 
and then a further experiment may be made to determine a better estimate 
of the desired response. Sometimes two such experiments are combined into 
one. These and similar problems are discussed in Refs. [6, 16, 17, 22, 42, 55]. 
Special procedures for studying the response surface are given by Box 
[8, 9]. Further references on fixed, mixed, and random models are cited in 
Refs. [44, 45, 52, 53], and special techniques for finding expected mean 
squares are given in Refs. [25, 47]. 

Certain advantages and disadvantages of the factorial have already 
become apparent. We now list these and others for quick reference. Some 
advantages are 

1. The interactions of two or more factors may be examined 

2. The most efficient use is made of resources, in that all responses are 
used in estimating effects and mean squares 

3. The results of the experiment may be applied over a wider range of 
conditions 

4. There are usually more degrees of freedom for the residual mean 
square, this being particularly true in the fixed model 

Some disadvantages are 
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1 The number of treatment combinations required to study several 
factors at several levels is large and often prohibitive 

2 There may be no interest in certain treatment combinations 

3 The experiment and computations are more complicated 

Some of the disadvantages, particularly the first one, may be overcome by 
applying one of the experiments ofSect 13 9 For elaboration on these and 
other advantages and disadvantages, the reader is referred to Refs [12, 15, 
19. 30] 

13 9 CONFOUNDING IN FACTOHIAl (XPERIMiNTB 

The number of treatment combinations in a factorial experiment can 
become quite large If every treatment combination appears in every block 
of a randomized block design, the large number of experimental units within 
each block may cause considerable increase in the error variance In order 
to reduce the error variance, Fisher [22J developed a method in which all 
treatment combinations ace placed in two or more blocks and the error 
variance is still dependent only on the variation within blocks This is done 
at the expense of losing total or partial wTormation on one or more of the 
treatment comparisons, that is, factors or interactions Since in such a design 
the variation of certain treatment comparisons are linked together (mixed 
up) with block to block variation, these compansonsare said to be confounded 
With blocks The blocks which comam only a fraction of the treatment 
combinations are called tncompUie blocks [7 12, 13, 19, 24, 30} 

We illustrate the principle of confounding with a 2' fjciorial Actually, 
eight treatments should usually be placed m each block of a randomized 
block design, but we use the small 2' factorial simply to get across the con- 
cepts With minimum writing Before we explain the details of confounding 
It IS desirable that we look at seven particular orthogonal treatment com- 
parisons with a single degree of freedom, which come from the eight treatment 
combinations m the usual randomized block design replicated n times 

For treatment combinations 

A^B,C„ A,B.C„ A,B,C„ A,B,C„ A,B,Cu A,B,C, 

the corresponding totals may be denoted by 

Tn,. T,„, r,.,. T,,., Tt,„ r,,.. T„U Tt„ (13 36) 

respectively, where the dots indicating summation are omitted Comparisons 
of means or totals are determined by multipliers For example, the multi- 
pliers for the comparison of the lower level with the higher level of A are 

(13 37) 


-1. -I. -I. -1. I. 1. 1, 1 
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It is easy to show that the mean square for factor ^ in a regular analysis of 
variance table (see Table 13.11) is the same as 




(-T,n - Tii; - T,2 i - TiiS + Tsil + ^212 + + ^ 255 )° (13 38) 

n[{-\y- + i-iy + ••■ + pj 


the mean square for the comparison with the multipliers in (13.37). According 
to Eq. (13.22) the sum of squares for factor A is 

_ T?.. + 71 .. _ (Tux + T m + • • • + Tuty ^13 39'^ 

ra 2-2‘2n ^ 


Since 

7) = Till + 7ti2 + 7 ) 2 1 + 7(22 0 ~ li 2) 


it follows that 


Q^.i = 


i-Tx. + T,y 
8/1 


and 


SSA = ^[2Tl + 2TI. - (Ti,. + T,y] 

= l(rf + T-I.. - 2r.., 72..) = 

Therefore, the mean square for factor A, 4, is equal to 0, since A has only 
two levels. 

Seven sets of multipliers for orthogonal comparisons along with corre- 
sponding source of variation are shown in Table 13.28. Using a technique 


Table 13.28 

Multipliers for Orthogonal Treatment Comparisons 


Source of 

Treatment Comparisons 

Variation 

AiBxCx 

AxBiCi 

A iBnC} 

A 1 S 2 C 2 

AnBiCi 

A 2 BIC 2 


A 2 B 2 CS 

A 

-1 

-1 

-1 

-1 

1 

1 

1 

I 

B 

-1 

-1 

1 

1 

-1 

-1 

] 

1 

C 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

AB 

1 

1 

-1 

-1 

-1 

-1 

1 

1 

AC 

1 

-1 

1 

-1 

-1 

1 

-1 

1 

BC 

I 

-1 

-1 

1 

1 

-1 

-1 

1 

ABC 

-1 

1 

1 

-1 

1 

-1 

-1 

1 


similar to the above, we can show that the other mean squares for treatment 
in Table 13.11 are equal to the g"s for six other orthogonal treatment 
comparisons. That is, it can be shown that 4 si 4 4 4 are the same 
Qb> 2c) Qab, Qacj Qbci Qabc> respectively. The analysis of variance for 
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the eight trejtmcnt comparisons randomized in three blocks with eight units 
each IS a particular case of that given in Table 13 11 To test the seven 
h>poihescs indicated each treatment comparison mean square is compared 
with an error mean squ irc with 16 degrees of freedom 

If. in a particulir cspenmeni, it is not possible to obtain blocks with 
eight nonhuterogeneous experimental units, then an effort should be made 
to use blocks with fewer units Suppose blocks of four fairly homogeneous 
experimental units may be obtained Then four selected treatment combina- 
tions could be randomly placed in one block, and the remaining lour placed 
in a second block to make a complete replicate of the experiment For 
example the four treatment combinMions with lower levels of A could be 
placed in one block and those with upper levels of -4 in a second block Then, 
if one attempted to obtain the mean »quarc for factor A, the variation due 
to the two blocks would be mixed vMih the variation due to A That is, the 
comparison for factor A is computed in the same way as the comparison 
between blocks In such a case we say that factor A is confounded with 
blocks and thus cannot be analyzed (ot separated out) m the experiment 
If one of the factors A B. C. sometimes called mam effect, is confounded 
with blocks w c say that the experiment is a split plot Clearly, any one of the 
sources of variation in Table 13 28 could be confounded with blocks Often 
one of the two-factor or three-factor interaction comparisons which is 
considered unimportant is confounded with blocks 

Figure 13 3 gives an experimental layout for eight treatment combinations 


Replicate I Replicate 2 Replicate 3 


AiB,C, 



A.BATt 

A B,C, 


4,fl,Ci 

A,B,C, 

AS,C, 

4,8 C, 


A,B,C, 

A,BC, 


4,8,C, 

A,B,C, 


A,B,C. 


A,B,C, 

A,B,C, 


A,B,C, 

A.B,C, 

4,8,0, 

A,D,C, 


A,B,C, 

4,0,C, 


A, 8, Cl 

4,8.C, 


Block I Block 2 Block 1 Block 2 Block I Block 2 


Fit; 12 2 Layout for s 2’ Facconal Expernnent in which the Three ^cror 
Intcraciion Comparison is Completely Confounded with Blocks 

replicated three times m six blocks with four experimental units in which the 
three-factor interaction comparison is confounded with blocks When the 
same comparison is conrounded la every replication, we say that there is 
complete confounding Thus, for the incomplete block design m Fig 13 3, 
the ABC interaction comparison is completely confounded with blocks Note 
that all treatment combinations with positive multipliers are randomized in 
one incomplete block and that those with negative multipliers are randomized 
in the other incomplete block of each replication 

If all mam and interaction effects are important, but some are of less 
importance than others then it is possible to design the experiment so that 
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full information may be obtained on some and partial information on those 
of least importance. Figure 13.4 gives an experimental layout for eight 
treatment combinations replicated three times in six blocks with four experi- 
mental units in which the AB interaction is confounded with blocks in the 
first replicate, AC interaction is confounded with blocks in the second repli- 
cate and BC interaction is confounded with blocks in the third replicate. 
For the incomplete block design in Fig. 13.4 we say there is partial confound- 
ing. Actually, since each two-factor interaction comparison is confounded 


Replicate 1 


Replicate 2 


Replicate 3 


A\B\C^ 

AiB-jC. 

A-iBX. 

AiBfx 

A^BoCi 

A,BiCi 


A.BiCi 

Block 1 

Block 2 


A iRiCi 
aIBsC, 


A«B,Ct 
AtB-iC, 
A2B2C2 
i AiBiC. 


AiBi^^i 

A jB^Co 

1 A.Bla 


vtiBoCi 

AiBiCi 

A1B1C2 
1 A^B^Ci 


AB Confounded AC Confounded BC Confounded 

Fig. 13.4 Layout for a 2'‘ Factorial Experiment in which the Three Two-factor 
Interaction Comparisons are Partially Confounded with Blocks 


and is confounded the same number of times, the term balanced partial 
confounding is often applied. Also when the four interaction comparisons or 
the three main effects comparisons are partially confounded the same number 
of times in incomplete blocks, we say that there is balanced partial con- 
founding. 

It should be noted that in each replicate of Figs. 13.3 and 13.4 only one 
treatment comparison is confounded with incomplete blocks. For example, 
in the second replicate of Fig. 13.4 the AC interaction comparison is con- 


Table 13.29 Table 13.30 

Analysis of the Experiment of Analysis of the Experiment of 

Figure 13.3 Figure 13.4 


Source of 
Variation 

Degrees of 
Freedom 

Source of 
Variation 

Degrees of 
Freedom 

Replicate 

2 

Blocks 

5 

ABC (or block) 

1 

A 

1 

Replicate x ABC 

2 

B 

1 

Treatments 

6 

C 

1 

A 

1 

ABC 

I 

B 

I 

AB 

1 

C 

1 

AC 

1 

AB 

1 

BC 

1 

AC 

1 

Residual 

11 

BC 

1 

Total 

23 

Residual 

12 



Total 

23 
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founded but the other six treatment comparisons are not confounded since 
of the four treatment combinations in each of two incomplete blocks exactly 
two have positive and two negative miihipliers 

The source of variation and partition of degrees of freedom for the 
experiments of Figs 133 and 134 arc shown in Tables 13 29 and 13 30 
respectively All mean squares for treatment comparisons which are either 
not confounded or partially confounded are tested against the residual mean 
square The sum of squares (or mean squares) in Table 13 29 are computed 
by the usual technique In Table 13 30 all treatment sum of squares except 
the three two factor interaction comparisons which are partially confounded 
are found in the usual way The sum of squares for the AB comparison is 
found by usinc only totals from replicates 2 and 3 In computing the sum 
of squares for the treatment comparisons which are not confounded the 
divisor IS 24 fur those which arc partially confounded the divisor is 16 
Thus the treatment comparisons which are not confounded are obtained 
with higher precision than those which arc partially confounded the ratio 
of precision being 3 to 2 Since the differeni divisors take care of the 
different precision no furiher correction is required m the analysis of variance 
and test procedures 

We have illustrated the principle of confounding in a very simple case 
It IS possible (0 confound 3 factorials in replicates with three blocks 2* 
factorials in replicates with four blocks etc Different experimental models 
relative efficiency missing values and many other topics could be discussed 
The reader is referred to Refs (5 12 15 19 28 30 33 34 35 39 40 55 56 
57] for further study of these ind other topics 

The spilt plot design is particularly important in experiments in which 
the levels of one factor say A require larger experimental units than other 
factors say fi C For example in the study of some property of alloys 
the differences in furnaces A may require large amounts of material and the 
differences m molds and specific gravity B and C relatively small amounts 
In another example m the stimulus response of plants one factor might be 
for differences m whole plants A and another for differences in leaves B 
Two factors which affect the response in a laboratory experiment with 
students might be solution A in a large quantity and an additive ^ to a 
relatively small sample The analysis for split plots is similar to those already 
given IT) Tables 13 29 and )3 30 except that there are two error terms one 
for the whole plot factor A and another for the split plot" factor (or 
factors) If more than two sizes of experimental units are required the plots 
are split accordingJy Thus spht split plots and higher order split plots are 
also used For details on these and other topics the reader is referred to 
Refs [I S II 12 18 19 30 31 49 SO 51 SS] 

In experiments treated so far in this chapter all treatment combinations 
are arranged in some design For a large number of factors say more than 
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four, the number of treatment combinations as well as the number of possible 
tests becomes quite large. By confounding we reduced both the number of 
experimental units in a block and the number of test procedures. But when 
there are five or more factors, there is often a need for a reduction in the 
number of treatment combinations in the experiment. If only the main effects 
and certain interactions are of interest, it is possible to analyze these effects 
when only a fraction of the treatment combinations are selected. For example, 
in an experiment involving only the first block for each replicate in Fig. 13.3, 
we could still obtain information on all treatment comparisons except the 
ABC interaction comparison. Such an experiment is called a fractional 
factorial. That is, an experiment in which the treatment combinations of a 
block are equivalent to those in one block of a replicate in a system of 
confounding is called a fractional factorial. Details of fractional factorials 
are discussed in Refs. [15, 20, 21, 29, 30, 32, 39]. 

13.10. EXERCISES 

13.1. The data in Table 13.31 are for a fixed-effects 2x3 factorial experiment 
in a completely randomized design, (a) Express each of the six observa- 


Table 13.31 


Levels of factor A 


1 



2 


Levels of factor B 

1 

2 

3 

1 

2 

3 

Treatment combination 

A/Bi 

A,Bs 

AjBs 

j4oBi 

A-iB. 



1 

25 

23 

18 

26 

10 

18 

Replication 

2 

28 

22 

23 

20 

13 

17 


3 

25 

21 

22 

20 

13 

16 


tions for replication I as the sum of the over-all mean, an A effect, a B 
effect, an A B interaction effect, and an error, (b) Prepare an analysis of 
variance table showing the expected mean squares, (c) Test the three 
hypotheses in (13.3), showing all steps in the general test procedure, 
(d) Identify the two factors with variables in your own field of study, 
write a short statement of an experiment which could lead to the above 
analysis of variance, and write your conclusion in terms of the variables 
introduced, (e) Find a 95 per cent confidence interval for the difference 
in the means of the two levels of factor A. (f) Find a 95 per cent con- 
fidence interval for each of the three means for the levels of factor B. 

13.2. The data in Table 13.32 are for a fixed-effects 2x3 factorial experiment 
in a randomized block design, (a) Express each of the six observations 
for treatment combinations AiBj and A,B, as the sum of the over-all 
mean, a block effect, an A effect, a B effect, an AB interaction effect, 
and an error, (b) Prepare an analysis of variance table, showing the 
expected mean squares, (c) Test the three hypotheses for the equality of 
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T«Me 13J2 



factor A effects factor B effects, and interaction effects, (d) Find a 
90 per cent confidence interval for each of the following (1) means for 
levels of factor A, (2) differences in means for levels of B, and (3) mean 
of the treatment combination A,D, in block 2 
13J. The true population means for IS treatment combinations are given m 
Table 13 33 (a) Find the true effects for factor A, factor B, and inter* 
action of A and O <b) Find (he value of 

and 


Table 



] 66 62 SS 47 42 

2 54 54 S3 42 37 

3 51 49 45 46 44 


Use this information to compare the effects of the two factors 
13 4 Prove (he sum of squares identity <13 1) 

13 5 Prove that <7, h, andfahli^areunbiascdestimatorsof U) and(aj3)ii. 
respectively Assume a fixed effects c x /• factorial experiment in a 
completely randomized design 
13 6 In Table 13 7, derive the expected mean square for 
137. The data in Table 13 34areforafixed-effects 3 x 2 factorial experiment 
in which SIX random samples, each of size two. were drawn from a 
norma! population with mean SO and variance 100 (a) Prepare an 


TaUe 13 34 



2 20 ^ 7 -/*)’ 
Icr-l) 


("e*^ tT"^’ ic iKr- I) 


62 63 

45 34 


53 59 

46 39 


61 46 
59 46 



SECT. 13.10. 


AN INTRODUCTION TO FACTORIALS 


487 


analysis of variance table and test each of the hypotheses in (13.3) at 
the five per cent level. What is the significance level for the whole experi- 
ment? Since you know the source of the six samples, indicate whether 
a type 1 or type 2 error was committed, fb) Compute the true and esti- 
mated effects of factor A and of the interaction of A and B. Compare 
the true and estimated effects. Draw graphs to compare the true and 
estimated interaction effects, (c) Add 50 to each observation of the 
treatment combination, 100 to each observation of the A^B^ treatment 
combination, and leave the other eight observations unchanged. Prepare 
an analysis of variance table and test each of the hypotheses in (13.3) 
at the five per cent level. Since you know the sources of the six samples, 
indicate whether a type I or type 2 error was committed in each test. 

(d) Use the data in (c) to do (b). (e) Add 50 to each observation of the 
AiBs, A^B,, and A^B, treatment combinations and 100 to each observa- 
tion of the A,B, treatment combination, leaving the other four observa- 
tions unchanged. Prepare an analysis of variance table and test each of 
the hypotheses in (13.3) at the five per cent level. Indicate whether a 
type 1 or type 2 error was committed in each test, (f) Use the data in 

(e) to do (b). (g) Find the numerical value of the expected mean squares 
for (a), (c), and (e). Use these expected mean squares to explain any 
differences in the actual mean squares found in (a), (c), and (e). 

13.8. The data in Table 1 3.35 are for a fixed-effects 5x3 factorial experiment 
in a completely randomized design (the reader may think of the levels 
of A as different temperatures, the levels of B as different fabrics, and 
the observations as coded values for percentage of shrinkage during 


Table 13.35 


Factor B 

Replication 

1 

1 

2 

Factor A 

3 

4 

5 

1 I 

1 

1 

50 

89 

70 

50 

34 

2 

64 

59 

53 

56 

49 

1 

2 1 

1 

39 

57 

44 

41 

26 

1 

2 1 

51 

82 

60 

32 

38 

1 

3 

1 

36 

53 

42 

51 

44 


2 

44 

36 

31 

47 

40 


dying), (a) Prepare an analysis of variance table, (b) Find 90 per ccni 
confidence intervals for each of the 15 cell means. Since the observations 
were taken from populations with means shown in Exercise 13.3, 
determine how many intervals include the true mean, (c) Use Duncan’s 
range test to rank the means for the five levels of factor A. Also, use 
Duncan’s range test to rank the means for the 15 cells. Compare both 
rankings with the true rankings obtained from Exercise 13.3. (d) Use the 
true means for the five levels of factor A to write four orthogonal 
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comparisons Use the data of this exercise to detennine 95 per cent 
confidence intervals for these comparisons In each case, check to see 
if the true value for the comparison falls in the interval (e) Find a 95 
per cent confidence interval for the difference in the mean of the third 
level of factor A and the mean of the combination of the second level 
of factor A ard (he third level of factor B Indicate how such informa 
tion might be useful (f) Discuss the interaction effects in terms of a graph 
Assume that a small percentage of shrinkage is desirable (g) Assuming 
(helcsebof 5(0 be random and those of to be fixed, test the hypothe 

$1$ that the A effects are equal Also, find a 95 per cent confidence interval 
for the mean of the fifth level of factor A (h) Assuming the first replica 
tions form one block and the second replications a second block, test 
the hypotheses in {13 3) Assume a fixed-effects experiment 
13 9 Derive the expected mean squares for and sj shown m Table 13 10 
13 10 The data in Table 13 36 are for a fixed-effects 2x2 factorial in a Latin 
square design (a) Prepare an analysis of variance table and test the three 
hypotheses of (13 3) (b) Identify the two factors, rows, and columns 
with varubles in your own field of study, write a short statement of an 


Table I3J6 



1 

Cclueins 

2 } 

4 

1 


29U,6,) 

10(4,8,) 

I9(d,8,) 

^ 2 


20(A,Bt) 

14(4,8.) 

22(4,8.) 

3 


20[A,B,) 

25(4,8.) 

17(4,8,) 

4 

2t(A,8,) 

tHA,6,) 

23(4,8,) 

14(4,8.) 


experiment which could lead (o the above analysis, and write your con 
elusions in terms of the variables introduced 

13 11 Write the model equation for an A x B factorial experiment m a Latin 
square design Derive the sum of squares identity for the usual analysis 
of variance Make an analysis of variance table showing the expected 
mean squares 

13 12 (a) Complete Table 13 37 for the analysis of variance and expected mean 
squares of a mixed model (/9) factonal experiment in a randomized 
block design (the reader may assume that the computations are for any 
three factor factonal expenment he chooses) (b) Use the analysis of 
variance table Table 1337 toiestaUhypothesesconcerningmameffects 
two factor interaction effects, and three factor interaction effects (c) 
Find unbiased point estimates of all the variance components in (a) 
Find 90 per cent confidence intervals for <r.a^ <r„ and <r. (d) Test alt 
hypotheses m (b) assuming a mixed model 7) Compare the results 
in (b) and (d) 
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Table 13.37 


Source of 
Variation 

Sum of Degrees of Mean Expected 

Sauares Freedom Squares Mean Squares 

Blocks 

2 28 

A 

160 

B 

2 210 

C 


AB 

170 2 

AC 

84 

BC 

6 180 

■ ABC 

228 

Error 

15 

Total 

3338 


13.13. In a study to determine the effect of age and type of soil on the 
unconfined compressive strength of mixtures of cement with soil, the 
data in Table 13.38 were obtained. (The compressive strength was in 
pounds per square inch, and ten per cent of each mixture was cement. 
Soils Leighton Buzzard sand, Tunstall Common A, Tunstall Common 
B, and Whitchurch are denoted by I, II, III, and IV, respectively.) 

Table 13.38 


Age 

Soil 

A 

B 

C 

Batch* 

D E 

F 

G 

H 


I 

490 

380 

345 

295 

380 

350 

410 

335 

7 days 

11 

520 

460 

45 

75 

60 

220 

80 

265 

111 

590 

400 

20 

250 

20 

430 

175 

500 


IV 

810 

620 

395 

550 

90 

540 

700 

905 


I 

830 

760 

720 

660 

670 

640 

860 

750 

28 days 

II 

1020 

870 

750 

770 

940 

800 

870 

980 

111 

970 

680 

860 

670 

620 

845 

770 

830 


IV 

1500 

1210 

1330 

1040 

1000 

930 

1130 

1270 


• Data from P. T, Sherwood, “Rapid Method for Detecting the Presence of 
Deleterious Organic Matter in Soil-Cement,” Journal of Applied Chemistry, Vol. 
12 (1962), pp. 279-88. 


Prepare an analysis of variance table to use in discussing this experiment. 
Write a detailed report, bringing out any information which seems 
pertinent to you. You should note such things as possible differences 
in error variance for 7 and 28 days, variability in batches, comparison 
of sand with the other three soils, and size of confidence intervals. 
Assume that the batches are random. 

13.14. An investigation was made to determine the sources of variation in the 
wool content of nominally ten per cent wool blankets. Table 13.39 gives 
the percentage of wool in 1 6 pieces of blanket, eight for each color, four 
for each batch, and two for each loom. (The batches, loom, and pieces 
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were randomly selected ) Estimate and discuss the sources of vanatioa 
Hmi This type of problem was discussed m an earlier chapter, and 
should be carefully distinguished from a factorial experiment 


Table 13J9* 




f 

Piece 

Percentage 

Material 

ttamber 

Number 

Number 

of Woo! 




1 

101 



* 

2 

; 97 


1 * 


3 

1 142 

^ 



4 

151 



5 

147 





6 

: 149 




7 

101 




8 

14 4 



5 

9 

106 


3 

10 

10 3 



11 

125 

Yellow 


* 

12 

12.4 



13 

103 




14 

118 




IS 

12 8 




16 

114 


• Data from W S Connor '‘Locating Important Sources of Variation " 
Induilriol end Engiiiff ring Chtmisiry,\o\ 53 No 12(1961) pp 73A-74A 


13 15 Give graphic interpreutions of all the interactions in Example 13 3 

1316 Let denote breakers D gaugers C curing time, and /> mixes Coded 


Table 1340 
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data for the compressive strength of hardened cement cubes for such a 
2* fixed-effects factorial experiment in a completely randomized design 
given in Table 13.40. (a) Prepare the analysis of variance table for this 
data applying the usual techniques, (b) Prepare the analysis of variance 
table for this data, using the systematic method due to Yates [56, 57]. 

Hint. List the 16 cell totals in the order indicated in Table 13.41. (The 
treatment combination A^B 2 C 2 Di is written as 1221, etc.) The first eight 
entries in column (3) are the sums 126 -f 75, 99 + 75, . . . , 96 -1- 72, 
and the second eight entries are the differences 75 — 126, 75 — 99, . . . , 
72 - 96. The entries in column (4) are found by operating on column 
(3) in the same way. The process is continued until all entries through 
column (6) are found. Finally, an entry in column (7) is found by squaring 
the corresponding entry in column (6) and dividing by (3)(16). The 
reader can now finish this table and justify the computations. For 
example, show that 1692.2 is the sum of squares for the main effect 
A. (c) Test the indicated hypotheses and write a careful summary 
statement for the experiment, explaining any significant main effects 
and interactions in terms of breakers, gaugers, curing time, and mixes, 
(d) After pooling all treatment mean squares with the error mean 
square which you consider appropriate, discuss (c) in terms of the 
new error mean square. 


Table 13.41 


Treatment 

Combination 

(1) 

Totals in 

16 Cells 
(2) 

Sums and Differences of Pairs 

(3) (4) (5) (6) 

OH - 1(6)1= 

^ 48 

(7) 

nil 

126 

201 375 780 1443 


2111 

75 

174 405 663 -285 

1 692.2 (/4) 

1211 

99 

201 273 -120 -195 

792.2(B) 

2211 

75 

204 

iAB) 

1121 

117 

; 195 

(C) 

2121 

84 

78 

(AC) 

1221 

108 

222 

{BO 

2221 

96 

168 

{ABC) 

1112 

123 

-51 ' 

(D) 

2112 

72 

-24 

{AD) 

1212 

60 

-33 

{BD) 

2212 

18 

-12 

{ABD) 

1122 

135 

-51 

{CD) 

2122 

87 

-42 

(ACD) 

1222 

96 

-48 

{BCD) 

2222 

72 

-24 24 15 21 

{ABCD) 


13.17. Let A denote concentration of a detergent, B concentration » of 
sodium carbonate, and C concentration of sodium carboxy-methyl 
cellulose. Table 13,42 gives data for the cleaning ability of a solution in 
washing tests for a single replication of a fixed-effects 3^ factorial design 
(large numbers indicate improved cleaning ability), ta) Prepare an 
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Table 13^2* 


Detergent 1 

2 

X, 1 

x» 

Sodium Carbonate , 

B, S, S, 

Bi e, B, 1 

B, B, B, 


C, 

lOd 197 22) 

198 329 320 

270 361 321 

Sodium CMC 1 

1 

149 255 294 

24) 364 410 1 

315 390 415 



182 259 297 

232 389 416 

340 406 387 


* Data from Fu«I| and Wagg, ‘'Statistical Methods m Detergency Investi* 
gatwm,’* IJeieortA, Vol 2 (1949). p 3)4 


analysis of variance table in which the sums of squares for interactions 
XC and XfiC are pooled and used for the error mean square m testing 
all other two-factor interaction effects and all mam effects V/rite a 
summary report of your findings (b) For each mam effect sum of squares 
it is possiUe to define t»o orthogonal linear comparisons, each with a 
single degree of freedom Ihc comparisons with multipliers 1, 0, —1 
and 1. -2, 1 are called /i/iear and ^uadrune components, and, for factor 
X, are designated by Xi and x®. respectively Compute and interpret 
the variation due to X*. X^. B/ fff. Ci. Cj. XiBt. Xs5«. Xjfii, XjB,. 
BtCi, BiCf, B-jCi, and (c) Use the systematic method due to 
Yates to compute the sums of squares in (a) It would be very informative 
if the reader determined a good computational technique, using Exercise 
13 l«b) as a guiq« 

13 18, Derive the expected mean squares in Table 13 25 
13 19 Use an appropriate pooling technique to increase the degrees of freedom 
m the error mean $qure of Exercise 13 17 and to discuss the significance 
of the comparisons in Exerase |3 17(b) 

13 20. Determine an experimental layout for a 2* factorial experiment m which 
the three factor inieraclion comparison BCD is completely confounded 
with blocks Give two rcplicaies Write l)ie appropriate analysis of 
variance, indicating only the sourecsof variation and degrees of freedom 
Show how typical sums of squares are computed 
JX2t (s) Determine sn erpenwentet layout for a 2' fsetorial experiment in 
which the three-facto* imeraclion compijTisons -4BC, <4fiO, and XCD 
are partially confounded with blocks (b) Write the appropriate analysis 
of variance and show bow typical sums of squares are computed 
13 22, The analysis of variance for a ^lit-plot experiment with c whole plots 
X. r split plots B, and b blocks (or replicates) is given in Table 43 43 
An cxpenmental layout and the observations for two subplots within 
each of four whole plots replicated three times is given m Table 13 44 
(a) Write the model equation and prepare an analysis of variance table 
for this data (b) State and test hypotheses about factor X. factor B, and 
interaction of A and B Write a summary statement for this experiment 
in terms of factors in your own field (c) State and prove the sum of 
squares identity for a spirt ptot eapwiment 
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Table 13.43 


Source of 
Variation 

Degrees of 
Freedom 

Expected Mean Squares for 
Fixed-Effects Model 

Blocks 

6- 1 


A 

c — 1 

-k- ral + br'^ <4l(fi 1) 

Whole-plot error 

{b - l)(c - 1) 


B 

r- 1 

+ bc'^ — 1) 

AB 

(c-’l)(r- 1) 


Split-plot error 

c(A-l)(r-l) 

<r2 

Total 

bcr — 1 

\ '' 


Block 1 


Block 2 


Block 3 


Table 13.44 


AiBi (21) 


44 B 2 (8) 

42^2 (13) 


44 P. (16) 



43 (26) 


4,^, (26) 

AiB, (16) 


4,^2 (24) 



4,^2 (26) 


44 B 2 (17) 

4i5, (26) 


445 , (23) 


4,B2 (21) 


43 B, (21) 

4,S, (23) 


43 B, (13) 



AiB^ (13) 


42 B 2 (19) 

44 5, (19) 


42 B, (25) 



42^1 (28) 


4,S2 (13) 

42^2 (20) 


43 B, (23) 


13.23. Spherical particles ^ in. in diameter were dropped into a glass tube three 
in. in diameter, and the percentage of porosity was measured. Data for 
an experiment with four kinds of material, two heights of drop, and eight 
distances from the wall of the container are given in Table 13.45. 


Table 13.45 

Percentages of Porosities of Successive Annuli or Layers 


Material 

Deposition 

Over-all 

Porosity 

0 

Dis 

tame from 
1 ] 

Wall (inch 

1 4 

les) 

L 

1 

1 

Lead 

Cascaded 

39.3 

47.3 

40.3 

36.6 

37.5 

37.0 


37.3 

37.7 


36 in. drop 

38.0 

44.0 

35.3 

34.2 

37.7 

38.4 


36.6 

38.7 

Phosphor- 

Cascaded 

40.9 

49.0 

39.3 

37.8 

38.1 

40.1 

39.3 

40.2 

40.3 

bronze 

36 in. drop 

38.2 

45.7 

37.3 

33.1 

34.0 

38.3 

37.4 

37.3 

37.9 

Polystyrene 

Cascaded 

40.9 

47.8 

38.8 

39.2 

38.7 

40.3 

41.1 

40.6 

40.8 


36 in. drop 

36.3 

45.0 

34.0 

29.1 

33.8 

37.0 

37.1 

37.5 

35.5 

Glass 

Cascaded 

40.1 

47.9 

39.3 

39.0 

36.4 

37.8 

38.8 

38.5 

37.3 


36 in. drop 

35.6 

42.4 

37.9 

29.8 

31.5 

34 . 5 ! 

1 

36.1 

34.9 

36.6 


* Data from J. C. Macrae and W. A. Gray, “Significance of the Properties of 
Materials in the Packing of Real Spherical Particles,” British Journal of Applied 
Physics, Vol. 12 (1961), pp. 164-72, Table 6. 








494 


AN INTRODUCTION TO FACTORULS 


CHAP 


Analyze the data to detennine the effects of material, deposition, and 
distance on porosity 
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14 

REGRESSION AND RELATED TOPICS 


The dependence of the mean response on increasing levels of a quanti- 
tative factor (or quantitative factors) is studied. It is shown how the sum of 
squares identity and procedures for estimation and hypothesis testing may 
be extended to problems in which observations are made on more than one 
characteristic of an object. Linear and polynomial regression are developed 
in some detail. It is shown how orthogonal polynomials may be applied in 
determining the nature of a trend. Correlation is related to regression. Simple 
covariance analysis is used in comparing slopes of lines. 


14.1. INTRODUCTION 

So far in our studies we have analyzed differences in responses at 
different levels of a treatment factor or at different treatment combinations. 
We have not studied any trend that might exist between response and levels 
of a factor (or combination of levels of several factors). That is, we have not 
attempted to relate changes in responses to changes in levels of the treatment 
factor. For example, in the soil-sample experiment (Example 12.1) we were 
concerned with the acidity level at different depths, but we were not con- 
cerned with a trend of acidity level with depth of soil. In many problems 
the relationship between the levels of a quantitative variable .v„ and the 
corresponding average response = p,, should be taken into account. 
Some examples are 

1. The average tensile strength of cement increases with curing time 

2. The amount of /8-erythroidine in an aqueous solution changes in a 
regular fashion with the colorimeter reading of the turbidity 
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3 Students grades on a subject might be related to a college entrance 
examination 

4 The yield of a crop is related to the amount of water and amount of 
fertilizer 

5 The maximum temperature during each day is related to the time of 
year 

6 The price of a certain type of item is related to time cost of raw 
material and cost of labor 

In some areas of study relationships are said to be exact and are expresstd 
as mathematical functions of the form 

y = G(yt, , V*) 

This IS particularly true in physics and some areas of chemistry Examples of 
such rehtions are found jn Boyle s gas law Newton s law of force and acccl 
eraiion Ohm s law in electricity etc Such relations may be used to predict 
} With good accuracy when values ofx t, are specified There maybe 
problems m finding (he correct” relationship or in making accurate 
measurements but there is seldorr liitle doubt that a single correct” under 
Iving functional relationship exists We do not consider such “exact" func 
tional relations 

In most investigations j cannot be determined exactly from a set of 
values X r, even when the x s are known without error For example 
there is no functional relationship which gives the exact weight of a person 
whose exact height is known Still it is well known that a tall man is likely 
to weigh more than a short man In fact we are told by health experts what 
weight IS best for a man of a given age body build and height In this type 
of problem it is often assumed that a functional relationship exists between 
the variables x and ihe average response 17 associated with these 

variables That is it is assumed that 

7 = g(x Xt ff (I,) (14 1) 

where ly denote parameters The functional relation given in Eq 

(14 1) IS known as a regression equation Usually the problem is to specify 
the best functional form (family) and then to determine values of the para 
meters which give the most appropriate equation (particular member of the 
family) m a given experimental siiualion 

How does one set about selecting a particular regression equation for the 
population being studied The functional form may be determined by the 
experimenter by (J) taking into account his knowledge of the theoretical 
structure existing or seeming to exist among the variables involved or (2) by 
plotting sample points in a scatter diagram Both methods are important in 
experimentation Method (I) should ^ first consideration as it is also useful 
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in selecting the appropriate variables before the observations are made. 
Method (2) should be used to supplement method (1), or, if little is known 
about the underlying theoretical structure, to make a decision about the form 
of the relationship. The scatter diagram has limited use, since it consists in 
plotting sample points in a rectangular co-ordinate system with axes cor- 
responding to the variables involved. The scatter diagram is illustrated in 
Example 14.1. 

In this chapter we usually assume the functional form to be known and 
present a method, the method of least squares, for choosing estimates of 
particular parameters which specify a unique relationship connecting the 
average response rj with the variables .v,, . . . , x^. The method of least squares 
gives estimators of the parameters 6i, ■ ■ ■ ,0p which are unbiased and have 
minimum variance of all estimators which are linear functions of the obser- 
vations. In fact, if we assume the response variable to be normally distributed, 
it can be shown that the method of least squares is the same as the method 
of maximum likelihood, the most widely accepted general procedure for 
estimation at this time. The method of least squares is simply a procedure 
for finding estimators d,, . . . ,dp of parameters dx, . . . ,6p which minimize 
the function 

Q = i,{yu- quf (14.2) 

where 

• ’ • , Sj, . • • , dp) (m = 1., ... ,n (14.3) 

and y,,, x 

I«) ‘ ) ^KU denotes the uth set of observations on A: + 1 variables, 

only being random. The relation 

V = g{xu-..,x^-,d„...,dp) (14.4) 

is called the equation of the regression curve of best fit. The above proc -dures, 
along with some notation, are illustrated in the following example. 

Example 14 . 1 . Use the following hypothetical data to (a) plot s scatter 
diagram, (b) find the least-squares estimates of the parameters in the line of 
best fit, (c) find the error mean square of the deviates about the line of best 
fit and (d) test the hypothesis that the slope of the true regression line is zero 



The scatter diagram is shown in Fig. 14.1, and it is obvious that a straight 
line should be used to fit the data. In fact, it appears that the line through 
the points with co-ordinates (1, 7) and (6, 2) is the appropriate line. 
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For every pair of values {x„ >,) it is assumed 
that Jr. IS fixed and is a random variable 
Thai IS, for each fixed value of a there js a 
corresponding array of >>'5 We assume as is 
usually the case, that each array of /s cor- 
responding lo a fixed ar is randomly and 
normally distributed with a common variance 
<r* It IS to be understood that at is a quantitative 
variable which can be observed without error 
(In particular, all one need assume is that the 
error in x j$ negligible as compared to that in 
> ) In this example, the mean tf of an array of /s is assumed to fall along a 
straight line, that is, the mean tj is given by 


Fig 14 I Scatiei Oiagram for 
the Data in Example 14 1 


tj ssa + $x (14 5) 

where a and ^ are parameters The moJef eQuanon for the wth pair of obser- 
vations (jr,, y„) IS given by 


y, = ff, + r. (u = J, ,n) 

or 

= a + 0x, + €, (u = 1, , n) 

Our problem 1 $ to find least-squares estimators o and b of a and ff, 
respectively and hence an estimator 


of 


a + bx, 
17 , * o 4 8x^ 


for each u Sometimes we write Eq (14 d) as 


(14 6) 




= a + bx 


(14 7) 


indicating that the regression mean 1 $ dependent on x We refer to Eq 
(14 5) as the true regression o/^ on X and to Eq (146)orEq (147) as the 
estimated regression of y on x 

By the definition of the method of least squares, a and b are values which 
make * 


Q-'^iy, - a - bx,y 


(14 8 ) 


a minimum Since a, and y, are data values, g is a function of a and b 
Thus, by the methods of the calculus (sec Sect 14 2), u can be shown that 
Eq (14 8 ) IS a minimum when 
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and 


« 2 ~ (2 
n 2 - (2 


(14.9) 


a — y — bx 


(14.10) 


After computing the totals shown in Table 14.1, we substitute them in 
Eqs. (14.9) and (14.10) to obtain 


. _ (6)(75) - (27)(2l) _ _j 
® " (6)(141) - (27)(27) 


and 




= 8 


Therefore, the regression line of best fit is 

= 8 - ;c 


(14.11) 


Table 14.1 

Computations for Example 14.1 



X 

y 

X- 

/ 

xy 


1 

7 

1 

49 

7 


4 

5 

16 

25 

20 


4 

3 

16 

9 

12 


6 

3 

36 

9 

18 


6 

2 

36 

4 

12 


6 

1 

36 

1 

6 

Totals 

27 

21 

141 

97 

75 


Equation (14.11) may be used to find for any value of x for which 
the true regression equation (14.5) is the model. For example, if it is assumed 
that Eq. (14.5) is the proper model for only those values of x in the interval 
1 < a: < 6, then estimates of r) should not be obtained for values of x out- 
side this interval. In most problems the line of best fit is assumed to hold for 
all values between the two extreme values of x in the experiment. For 
example, in our problem, when x = 2 the estimated mean of the array is 6. 
But, in many cases, values of x outside the two extreme values should not 
be used to find estimates of the means of the y arrays. This is usually the case 
in problems in which the x-axis is a time axis. The point is that estimates of 
array means may usually be found from the line of best fit for all those 
values of x between the two extreme x values of the experiment, but caution 
should be applied in other cases. 

It should be noted that the estimated mean i}u of an array is also a y value 
and that it is obtained by taking into account all y values, not just those in 
the array above x„. Further, note that the mean of all y’s corresponding to 
Xu may or may not be the same as the estimated regression mean for the 
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array corresponding to x. In our example, they happen to be the same for 
the arrays above j: = 1, 4, and 6 

The fitted regression line passes through point (4, 4) as well as the points 
(1 , 7) and (6, 2), obtained by inspection in part {a) Thus, the sum of squares 
of deviates of the y values about the fitted line is given by 

e = (7 - 7)’ + (5 - 4)* + {3 - 4)* + (3 - 2)* + (2 - 2)’ + (1 - 2)’ = 4 


This IS also called the residual sum of squares and is denoted by SSta It 
can be shown (see Sect 14 2) that the residual mean square jJ , with n — 2 
degrees of freedom defined by 




SSrts 

irri 


IS an unbiased estimator of <r' In our problem 


si. ^ 


4 


has four degrees of freedom and is an unbiased estimate of the common 
unknown variance a* 

The value b » - 1 gives an unbiased point estimate of the true slope 
ff In many experiments it is not enough to find the point estimate of the 
slope For example, it might be very desirable that we determine whether 
i * ~J IS significantly different from «ro For this reason, we often test 
the null hypothesis that 0 = 0 against the alternative hypothesis that 0^0 
This may be done by comparing the ratio of the regression mean square 
(see Sect 14 2) 

. _ In 2 


to the residual mean square si , with the upper a level value of the f distn- 
bulion with I and « — 2 degrees of freedom For the data of Example 14 1 
we find that 


and 


_ {(6)(7S) - (27)(2ni« 
' 6(141) -- (27)’ 


19 5 


5j 19 5 


Since 19 5 is greater than 7 71, the upper five per cent value of P ivitb one 
and four degrees of freedom, we reject the null hypothesis and conclude that 
0^0 Since the sample slope is negative, wc actually conclude that the true 
slope IS negative Thus, as x increases, y decreases 
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In this section we have given an indication of some of the types of pro- 
blems in regression analysis. In the next section the theory for linear regres- 
sion of y on x is developed. 

14.2. SIMPLE LINEAR REGRESSION 

Linear regression of y on x is sometimes called simple linear regression. 
The key formulas of Sect. 14.1 are now brought together for quick reference 
and comparison. 

The model equation for the nth pair of observations (x„, y„) is given by 


yu = Vu + eu (u = , n) 

(14.12) 

where 


r}u-a + fiXu (u = 1 ri) 

(14.13) 

denotes the true regression of y on x. The estimating equation is given by 

yu = vu-y e„ 

(14.14) 

where 


= y*. = fl + bx„ 

(14.15) 

denotes the estimated regression of y on x, and 


= yu — rju 

(14.16) 

denotes the amount the «th random observation y„ deviates from the cor- 


responding estimated regression ordinate. By studying Fig. 14.2, relations 



Fig. 14.2 An Example Relating the True and Estimated Regression 
of y on X to a Typical Pair of Observations (x,„y„) 
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among the true and estimated regression and observation should 

become clearer Note that y ■=! a + bX, where 

'Zy- 

X s= and y = -s—- 


The estimators a and b are selected so as to make 


e = 2 fi = 2 (14 17) 


a minimum If we think of Q as a function of a and b, those values of a and 
b which make Q a minimum, if it exists, are obtained by solving simulta- 
neously the equations 


30. 

■ 35 “ 


30 . 


The partial derivatives of Q with respect to a and b are given by 


|2 = -2 2(n — i-.) 

If = -o-*'.)*. 


respectively Setting these partials equal to zero and rearranging terms gives 


(2 + <2 = 2 


(1418) 


which are referred to as the normal (qualions for estimating a and /9 Any 
time Eq (14 8) has a unique solution for a and b, these values make Q a 
minimum, since Q is bounded below (ic, cannot be less than zero) The 
solution may be written as 

b = and a = y — bX (14 19) 

where 

SP = 2 XuX. - -= = 2 (Jf. - X) (y. - y) (14 20) 

and 


55Ar = 2^. - = 2(^. ' xy 


(14 21) 
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When j is a random variable, it follows that b, a, and tj are unbiased 
estimators of /S, a, and 77, respectively. Since 

•S’/* = 2 ~ y) 

U 

= 2 - j 2 (^« ~ 

u 

= 2(^« - ^)>'« 

we may express b as the following linear function of ji, . . . , 


Thus 


~ 2 ~ ■^) + 2 (^K ~ ^).T„ 


or 


since 


Further 


55';c 

E(b) = /S 

2 .T (Xu — 

“ « K 

= 2 (^u - •^) (^u - :r) = SSx 

U 

£(a) = £(y) - jc£(5) 

= + ■ • • + E{y^)] - ;c£(5) 

= -^[(a + /3x,) + • - • + (a + /9;c„)] - 

x/3 


(14.23) 


/2 

= a + /8 U^i — 


or 


Also 


Eia) = a 


£(77) = E(,a) + xE{b) = oc + /Sjc = 77 


(14.24) 


(14.25) 


Note that it is to be- understood that the same set of 
samples of size n. 


X s occur in repeated 
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In order to test hypotheses about and establish confidence intervals for 
the parameters a, and fj, it is necessary to know the nature of the 
distributions of b, a, and 5. respectively For this purpose we assume that 

(a) The model equation is given by Eq (14.12) 

(b) X, IS an exact observation and >« is a random variable 

(c) All arrays of y’s have the same variance ff* (14 26) 

(d) For every x there is an array ofy‘s which has the normal 
distribution 

These assumptions are pictured in Fig 14 3 It should be clear that the 
assumptions, with the exception of (a), are the same as those made for the 
fixed-effects model in analysis of variance In fact, when s= 0 the two are 
identical Thus, much of our attention will be given to B and its estimator b 
In this way we emphasize those topics in regression which are different from 
the usual analysis of variance 



Fig 14 3 Diagram Illustrating the Assumptions in (14 26) 

To find unbiased points estimators of 0, a, and tj, only assumptions (a) 
and (b) were required, but to obtain confidence intervals for these parameters 
we use all assumptions in (14 26) Applying assumptions (a) and (b) along 
with linear relation (14 22) and Theorem 6 1, we see that the variance of 
b is given by 



If, m addition, we assume that assumption (c) holds, then Eq (14 27) 
becomes 
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or 



( 14 . 28 ) 


Finally, if assumption (d) is used in addition to assumptions (a), (b), and (c), 
it follows that b is normally distributed with mean and variance al. 
Therefore 


Q) - 0) VsSx 
^2 


(14.29) 


has the standard normal distribution. When o-^ is known, we can use (14.29) 
to test the null hypothesis yS = /So, where /So is a constant, and to find 
100 (1 — a) per cent confidence intervals for /S. The symmetric confidence 
interval is given by 


b - 




< /S < 5 + 


cr 

\/SSx 


(14.30) 


The variance a-^ is not usually known. Hence, a sample variance must 
be used. It can be shown by the methods of Chap. 1 1 that 


s 


2 _ 
v/x ~ 


iS5'res 
n — 2 




n - 2 


(14.31) 


is an unbiased estimator of o-l (The divisor n — 2 makes 4/x an unbiased 
estimator of o-^ This seems reasonable, since the deviates about a + bx are 
used, and a and b are two linear restrictions among the observations 
• • ■ , yn-) Thus 


in - 2)sl,, _ SSres 

•y o 

/r* 


(14.32) 


is distributed as with n — 2 degrees of freedom and 

(14.33) 

4y/i 

is distribuced as t with n — 2 degrees of freedom. The standardized statistic 
in (14.33) may be used to test the null hypothesis /3 = /So in the same way 
that 


jx - /t) 
s 

IS used to test the null hypothesis fi = /x„. Further, a 100 (1 - a) per cent 
confidence interval for /S is given by 
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In Example 14 1 we used an F distnbution to illustrate a test of the 
hypothesis 0 = Q This was done mainly to indicate that regression analysis 
IS tied in with analysis of variance Actually, tn testing the null hypothesis 
/3 = 0 against the alternative hypothesis the statistic {14 33} becomes 

b-^Sx 

and may be written as 

^ = f(| „ _ 2) (14 35) 

since fc IS a linear combination with one degree of freedom Thaws, the 
two-sided I test is identical to the one sided F test of the hypothesisy) =» 0 
However, it should be noted at this lime that the F statistic m the 
example cannot be used to test the null hypothesis $ b 0, v,lien 0»^^or 
to establish a confidence interval for 0 For ihesi* reasons the t distribution 
is generally prefereed to the F distnbution m working with simple linear 
regression 

It should be noted that the variance of the estimator {>. given in Eq (14 28), 
decreases with increasing values of SSx Thus, values of x should be as widely 
scattered as possible in order to make SSx as large as possible m an experi- 
ment For example, if (here is strong evidence that the regression of y on x 
IS linear from x, to Xm but may no( be linear outside this interval, then half 
of the observations should be made at x, and half at x. in order to obtain 
minimum variance If, for some of the values intermediate to Xi and x., 
there is doubt that the regression is linear then, of course, some observations 
should be made at the intermediate values 

Sometimes it is important to know the nature of the distribution of the 
estimated regression mean (j, of an array of y*s corresponding to a particular 
Xo Substituting a = y — bx in Eq (14 IS) gives 

5. = y + b(x, - x) (14 36) 

where (xo, > 70 ) is any point on the admissible fitted regression line Since y 
and b are normally distributed random variables, the linear combination in 
(14 36) 1 $ also normally distributed with mean 

Ai;. = V* - « + (14 37) 


and variance 


'■ Ln SS* J 


(14 38) 
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The mean was found in Eq. (14.25). We obtain the variance as follows, using 
the notation = Var (_q), where q is any random variable 

= Var (^o) = Var [y + (xo - 
= Var (y) + Var [(xo - x)b] + 2 cov [y, (Xo - x)b] 

= Va^ ^ (J) ^2-0 

n 

since 

cov [j>, (xo - x)b] = cov [3’i + ■ ■ • + yn, (^1 — x)yi +• • • ■ 

+ (x„ - x)y„] 

+ cov [y„ (x„ - x)y„] + • • • + cov [y„, (x, - x)yi] 
+ • • • + cov [y„, (x„ - x)yn]} 

+ 0 + • • • + (x„ - x) Var (y„)] 

^ ( nSSx ) - ^)] = 0 

If is unknown, it follows that 


Vo — Vo 
Si,. 


7)0- {a + ffxo) 

,/± + 

V n ^ SSx 


(14.39) 


‘^y/x 


has the t distribution with « — 2 degrees of freedom. There are two cases 
of particualr interest. When Ao = x, then rjo = y, and Eq. (14.39) reduces to 


y - ^^v ^ y - {a + ffx) 
Sy jy j. n 


(14.40) 


where Sy — Sy/jj^ n has n — 2 degrees of freedom. When Xo = 0, 
Vo — y — bx = a, and Eq. (14.39) reduces to 


a — a 


^y/x 



x^ 

5Xx 


(14.41) 


which is also distributed as t with n -2 degrees of freedom. The symmetric 
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100 (1 — a) per cent confidence linuts of = a + 0x, at ;c = Xo are 
given by 

;; + 6(». -J) + (H42) 

The JifTifts are functions of x», the greater the distance between Xt and x, 
the more extreme the limits of 9, The symmetrie 100 (1 — a) per cent con- 
fidence limits of or, the parameter, arc 

4±/.,(»-2)4,.7IT^ (1443) 

These principles are illustrated m thcfollowing example 

Example 14.2. Use the data of Example 14 1 to find (a) a symmetric 
90 per cent confidence interval for 0 and (b) symmetric 90 per cent confi- 
dence intervals for ija when x, = 3 and 4 5 , 

From Example 14 1 we know that h * - 1, SSx » 19 5, and s}/, =/ 
with four degrees of freedom Since t«,(4) =s 2 13, we find, using (14 34), 
that the 90 per cent confidence interval of 8 is 

-148^/9^-052 

Since both bounds of the slope are negative, one has a high degree of con- 
fidence that the response decreases with increasing x From this information 
we could also reject the null hypothesis /9 = 0 and conclude that ff is negative, 
understanding that there is at most a ten per cent chance of making an error 
We know also that x = 27/6 and y = 21/6 Thus, when x« = 3, 
t}, s p + b{Xt ~ S)s= 5. and when x, c45s=^, ?5=:j>a35 Now 
substituting in Eq (14 42) gives confidence intervals 

387 ^(a-h 30)^6 13 (1444) 

and 

263^{i,^437 (1445) 

Note that the length of the interval in (14 44) 1$ 2 26 and that in (14 45} it is 
I 74 Since x, = 6 is 1 5 units la^r than x, we know that the length of the 
interval about = a + 60 is also 2 26 In other words, we can expect our 
estimate of ^ to be best in the vicinity of the mean x 

Sometimes U is reasonable to assume that a ~ 0 The true regression of 
y on X then becomes 

r, = 0x (14 46) 

It can be shown by a method analogous to the above that 
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and 


b = 


U 


(14.47) 


4/i = 


2 (J'a - bXuT 

u 

n - 1 


(14.48) 


is an unbiased estimator of o-“ with n — 1 degrees of freedom. It then follows 
that 


(b /3)V^x^ (14.49) 

^V/x 

is distributed as t with n — 1 degrees of freedom. 

Analysis of variance in a simple linear regression is best explained in 
terms of a sum of squares identity. Such an identity is derived by substituting 
Eqs. (14.19), (14.20), and (14.21) in the expression of SSrcs defined in Eq. 
(14.17). Then 

^^res = 2 — a — bx^^ 

U 

= 2 [(a - y)-b{,x^- x)]’' 

= 2 (:>'>* - i')^ - 26 2 (•^«< - (>’« - j') + ** 2 (^u - xf 

u u u 

= 2 (>'« - yf - 2b’bSSx + b^SSx 

u 

or 

55res = SSy - b^SSx (14.50) 

where 

='Eiyu- yy (14.51) 

u 

Using Eq. (14.50), we may write the sum of squares identity for simple linear 
regression as 

SSy = 55 res + 55reg (14.52) 

where 

55reg = b^SSx = ^ = bSP (14.53) 

Note that the sum of squares identity (14.52) can be obtained directly from 
the identity 


yu~ y= (ja - + (j,. - ji) 


(14.54) 
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where 

yi, — y = (a + bx,) — (a + bS) = A(a:, — X) 

The deviates on the right-hand side of Eq (14 54) are shown in Fig 14 2 
The analysis of variance and expected mean squares for simple linear regres- 
sion are given in Table 14 2 The test of (he null hypothesis = 0 has already 
been indicated 


Table 142 

Analysis of Variance for Single Linear Regression 



Sum of 

Detteet of 


Expected Mean 

yarialion 

Stfuarti 

Fretdom 

Quarts 

Squares 

Due to 1 

regression I 


t 

1 SSreg 1 

' ‘ 

«r» + e'SSx 

Deviation about! 
regression 

SSres «s SSy~SSttt 

11-2 

. SSres 
’ * .. 


Tout 

SSy 

»-t 




USA TEST FOR IINEARITY OF REGRESSfOS 

In Sect 14 2 we assumed the regression to be linear A method for 
checking the validity of this assumption is often desirable For example, 
one may suspect that data relating the curing time to tensile strength of a 
given size of cement cube are not linear Hypothetical data for such an 
experiment are given in Table 14 3. and the scatter diagram and line of best 
fit are shown in Fjg 14 4 The data points are indicated by dots and the 
array means by circles 

The method for testing linearity links regression analysis and analysis 


Table t4J 

Data for Tensile Strength of Cement Cubes 


Cube 

Number 

Curing Ttmr 
in Days 

Tensile Strength 

In Pounds 
y 

1 

2 

19 

2 

2 

21 

3 

3 

24 

4 

3 

27 

5 

3 

27 

6 

4 

29 

7 

4 

31 

S 

6 

35 

9 

6 

36 

10 

6 

37 
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Fig. 14.4 Scatter Diagram and Fitted Regression Line 
for the Data of Table 14.3 


of variance of a one-way classification. Perhaps the scatter diagram indicated 
how this is possible. The four-array means do not fall on a straight line. In 
fact, due to sampling variation, it would be very unusual to have four-array 
means fall on a straight line. The problem is to determine if these means 
deviate too much from linearity to be explained by chance causes. Such a 
decision is made after comparing the mean square for deviation from 
regression with a valid error mean square. The required variance estimate is 
obtained by pooling the error variances of the arrays, that is, by finding the 
error variance of a one-way classification. It is for this reason that the 
notation of analysis of variance is used. 

Let data points be denoted by (X(, 7 ,a). The regression line of best fit is 
still written as 


yx. = a + bxi 

but the estimating equation is written as 

ytu = a + bXi + eiu {i = h ■ . . , k;u - I, . . . ,ni) (14.55) 
The residual sum of squares, SSres, then becomes 

55 res =22 (^'.u - a - bxi)- 

i u 

Now, denoting the mean of the ith array by 


« tit 


we may write 
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5Sres =22 K>« - yO + (>• - >*.)r 


SSres = 22 (J'f ” yi)' + 2 56) 

since 

2 2 = 2 2 (>< - f’) 2 iyi‘ - ^<) = 0 

The first sum of squares on the right-hand side of Eq (14 S6) is the within 
sum of squares, SSll', in the one-way classification The second sum of 
squares represents the sum of squares of deviates of array means about the 
estimated regression array means and is denoted by SSO The degrees of 
freedom identity associated with the sum of squares identity (14 56) is 

fl - 2 = (« - A) + (Jt - 2) 

where 

» = 2". 

From the partition theory of Chap 10 we know that 551f and SSD are 
independently distributed Further, we know that 55lk'/(n - A) is an unbi- 
ased estimator of It can be shown by the methods of Chap 1 1 that 
SSD/{k — 2) IS an unbiased estimator of provided that the true regres- 
sion IS linear Therefore, the ratio 


SSD 

OnrT) 

ssnr 

UrriT) 


(14 57) 


IS distributed as F with A - 2 and n - A degrees of freedom when the 
hypothesis of linearity is true, that is, when 


H, Vi=a + 0x, 0=1. .A) (14 58) 

is true Note that the hypothesis speafies only that the A array means fall 
on a straight line 

Ifthe regression is not linear, the expected value of SSD/(A — 2) is greater 
than O’* "fbus the critical region for the a level test of (14 58) is made up 
of all values of F greater than F, = F,(A — 2, n - A) To test (14 58), use 
the experimental data and (14 57) to compute a sample statistic Fc, and then 
compare F^ with F, If Fr > F., reject (14 58) and conclude that the regres 
Sion IS not linear, if F F., fail to reject (14 58), understanding that there 
IS not enough evidence to conclude that the true regression is different from 
rj = a + (Sx (U should be noted that the test cannot be made unless more 
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than one y observation is made for at least one x. Otherwise, there would 
be zero degrees of freedom for {he within mean square.) 

The sum of squares, SSD, is not usually computed by 

1 

The following derivation leads to a better computational formula. According 
to Eqs. (10.38) and (14.52), the total sum of squares can be partitioned in two 
ways. Equating these two expressions gives 

S5res + SSreg = SSW + SSTr 

where SSTr denotes the sum of squares among array means. Solving' for 
SSres gives 

S5res = SSIf' + (SSTr - SSrtg) (14.59) 

On comparing Eq. (14.56) with Eq. (14.59), we see that 

SSD = SSTr - SSreg (14.60) 

Thus, for the test of linearity we make the usual one-way classification analy- 
sis, compute the regression sum of squares, and then apply Eq. (14.60). Such 
an analysis of variance is shown in Table 14.5. 

Example 14.3. Use the data in Table 14.3 to test the hypothesis of 
linearity, that is 


Ho: 7)i = cc + 0x, (i = 1, . . . , 4) 

Table 14.4 shows the data arranged in a one-way classification. The total 
and treatment sum of squares are found to be 348.4 and 336.4, respectively. 
From Table 14.3 we find SP = 86.6 and SSx = 22.9. Therefore, SSreg = 
327.5. The resulting analysis of variance is shown in Table 14.5. Since the 
computed F is less than Foi(2, 6) = 5.14, we fail to reject the hypothesis of 
linearity. That is, we do not have enough evidence to say that the regression 
is not linear. 


Table, 14.4 

Data of Table 14.3 Arranged in a One-way Classification 



Tensile Strength in Pounds for Day 

2 3 4 6 



19 

24 

29 

35 



21 

27 

31 

36 




27 


37. 


Totals 

40 

78 

60 

108 

1 

286 
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Tabk 145 

Analysis of Vat ence Tor Linnrity of Regression 


Source of Variation 



mBmm 


Tabled 

F 

Among means 

Regression 
Deviation from 
regression 

3364 

327 5 
«9 

3 

1 

2 

4 45 j 

] 

2.22 

5 14 

Within 

120 

6 

200 



Toat 1 

3484 


1 1 

1 



If our primary objective is to deicrrmne whether the regression is non- 
linear Of not, we should probably stop with the statement that the data fit 
the regression equation 

y, *139 + 3 78* (14 61) 

satisfactorily {Eq (14 61) was obtained from the data ofTablc 143)How. 
ever, if our mam objective is to establish a confidence interval or to test a 
hypothesis about the test of linearity should be considered a preliminary 
test In this case, it ts doubtful whether the significance level should be as 
low as five per cent, perhaps it should be in the neighborhood of 2S or 
40 per cent 

If the hypothesis is rejected, it is necessary to look further for the best- 
fitting regression curve This topic is discussed m Sect 14 S For further 
reading in simple linear regression see Refs (1, 5, 9. 16. 32, 33, 36, 44, 46, 
52, 59] 

J4 4 MUITIPU lINfAfi REGRESSION 

There are numerous examples in many areas of study where variation 
in two or more variables affect the random response Two illustrations, 4 
and 6 on page 498 are mentioned m Sect 14 1 Others may be found almost 
anywhere regression analysis is presented We use data taken from Oingnch 
and Meyer f31] and shown in Table 14 6 to illustrate principles of multiple 
regression 

Consider now the simples! and most common model for multiple regres- 
sion, namely, the model where the random response, y, is linearly dependent 
on two Qt more variables, x,. x,. . x. We discuss the cases where p = 2 

or 3, extensions to more than three dependent variables being obvious 

When p = 2, we assume each observation jv lt> be randomly drawn 
from a normal distribution with mean 


*7- = ft,/*.. = « + + /S,*,, (w=l. .n) (1462) 
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Table 14.6 

Aerial Stand Volume Data for Upland Oak 


Plot 

Number* 

Stand Height 
in Feet 

Crown Cover 
in Per Cent 

Volume per Plot 
in Cubic Feet 

y 

1 

90 

84 

460 

2 

74 

58 

433 

3 

70 

68 

365 

4 

75 

88 

376 

5 

68 

88 

419 

6 

58 

87 

330 

7 

80 

48 

362 

8 

87 

53 

381 

9 

80 

72 

431 

10 

120 

88 

1038 

11 

72 

96 

482 

12 

60 

59 

322 

13 

72 

77 

319 

14 

72 

62 

381 

15 

80 

86 

526 

16 

80 

84 

466 

17 

90 

78 

558 

18 

55 

85 

281 

19 

60 

72 

335 

20 

58 

92 

329 

21 

90 

85 

433 

22 

85 

93 

508 

23 

74 

94 

521 

24 

no 

72 

655 

25 

86 

73 

531 


* Data for only the first 25 of 93 plots examined by Gingrich and Meyer. 


and variance cr^. Estimators a, bi, and of a, /3„ and /Sj are found so as 
to minimize 


Q = '^(yu - a - b,x,u - biX^uY ( 14 . 63 ) 

u 

Setting the partial derivatives with respect to a, bi, and bi equal to zero leads 
to the normal equations 

na + (2 + (2 = 2 7u 

■ (2 + (2 ^ 1 “)*' + (2 XxuXi„)bi = 2 ( 14 . 64 ) 

Xi-^a - 4 ~ ( ^ ! XxiiXsii)bi 4 " ( X3jj')b2 = Xo^yu 

The solution of these equations gives unbiased estimators of the parameters 
and /9j, and, furthermore, it can be shown (see Gauss-Markoff theorem 
[15, 32,49]) that these estimators have minimum variance all of estimators 
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which are linear combinations of the observations The estimated regression 
of y on Xi and x, is given by 

= o + (14 65) 


By the methods of Chap 1 1 it can be shown that an unbiased estimator of 
0-’ IS given by 




2 - 5)’ 

“ m -"3 


(14 66) 


The solution of the system of equations in (14 64) can be found by any 
of the methods of solving simultaneous linear equations We solve by first 
eliminating <7 and then apptymgCramer's rule with determinants Solving for 
a in the first equation of (14 64) gives 

a = y — jC|h, — x,b, (14 67) 

where 

and = 

n n n 


Substituting Eq (14 67) m the second and third equations of (14 64) gives 
- -fi* --ftW +(2-^?>'>** 

(2 *••)(>’ “ '*!*> “ + <2 + ( 2 -*«>-)^« = 2 * u >» 

and, on further reduction, this leads to 


|o„/», + a„6, =g, 

+ ‘r„6, =*g, 


(14 68) 


where 


Cl, . SSic, = 2 »!. - *^*‘'*' = 2 - *.)' <■ = 1. 2| 

C„ = C„ = s/’i, = 2 ^ 2 - A) 

g. = sp„ - Siiii.y. - = SI".. - mr. - >■) (I = 1, 2) 


Systems (14 64) and (14 68) are similar, except that in system (14 68) the sum 
of squares and products are for deviates instead of original observations 
Note that if the true regression of^ on ;r, and x, had been written as 


»?»=>’ + - X,) + /9s(x„ - xt) 


(14 69) 
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and the estimators bi and 65 of /S, and /Sj, respectively, had been found so 
as to minimize 

Q ~ 2 v) X|) b^i^x^u ■^2)]'' 

U 

the resulting normal equations would be those given in (14.68). Further, 
Q may be expressed as 


Q='Eiyu- VuY = SSy - g,b, - g,b, (14.70) 

u 

where 

Vh= y + b,ix,u - X,) + b,(x«u - X.) 

The residual sum of squares is usually computed by (14.70). 

The regression coefficients are found by solving (14.68). There are many 
compact methods [8, 17, 19, 21, 22, 23, 29, 56] of solving normal equations. 
But since our primary concern is in finding interval estimates and in testing 
hypotheses afiout the /9’s, we use a method which is also very useful in 
determining the distribution of 6,. 

In order to see the generality of the method, consider the case where 
p = 3. Assume that Pu is a random normal variable with mean 

7 }u = a + iSi.Yi,, + ^o.Vo„ + 3X3,1 

and variance cr^ The normal equations are 

^\\b\ + Oiibn + 01363 = gl 

< Oji6, + 02,63 + 0,363 = g, (14.71) 

.^3161 + 03,6, + O3363 = g3 

where 61 estimates /Sj (/ = 1, 2, 3) and 
On = 2 - x)- 

• Oii = = 2 - Xj ) (/ ^ j ) (14.72) 

= 2 ^Xiu - Xi) (y„ - j) 

Let A denote the determinant of the matrix. A, of coefficients on the right- 
hand side of (14.71), that is 


Oil 

^7| 2 

«13 

021 


^23 

»31 

^32 

0^33 


When A 0, the solution of (14.71), when Cramer's rule is applied, is 
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b, 


A’ 




A’ 


b, 


A 


(1473) 


where At denotes the determinant which results by replacing the ith column 
of A by the column of g’s in (14 71) 

In order to determine the distribution of bi we express (14 73) as a linear 
combination of g’s which are linear combinations of y’s To illustrate, note 
that by expanding Ai by the first column we may write 




|ojt a»i| * |a« Ujil 
bi = Ciigi + c,tgt + Ciigi 


(14 74) 


. Xi°«i ®»«i 

'aIo,, a,,r 


- _ lUii a.il 

'* aIo,, 13., r 


a!o„ 0,1 


In a similar way, we may write 

|Ai = f»i?i + + Ci,S, 

U» = til?. + <•»»?, + c„g, 


(14 75) 


<■„ = (1= 1,2.3, 1,2,3 


being the two>by>two determinant obtained by deleting the Ith column and 
yth row of A Actually, c,,. Ci,. . c.i arc numbers in the matrix, C, which 
IS the inverse of the matrix A That is. the r's may be found by solving the 
matrix equation 

fan a„ n„] fc,, c„ c„] fl 0 0] 

0,1 a„ o« • c„ c„ c„ = 0 1 0 (1476) 

[flj, a,, a,,} [c„ c„ C|,j [O 0 ij 

In particular, c,,,c,,, c,, are found by solving the system of linear equations 
Onfii + ai,c„ + a„c„ = 1 
0,1 Cn + o„c„ + a„c„ = 0 (14 77) 

a,if,, + a„e,t + a,,c,, = 0 

The numbers c„, c,,, c„ and f,„ c„, c,, are obtained by solving two similar 
systems of linear equations with the nght-hand-side coefficients 1, 0, 0 
replaced by 0, 1,0 andO, 0, 1, respectively Now subsliluling Cn.Cn. , c,» 
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in Eqs. (14.74) and (14.75) gives bi, and b^, which may be used to write 
the regression equation of y on x,, Xs, x, as 

Tj = a + biXi + ZioXj + 63 X 3 (14.78) 

or 

97 = y + 5i(xi — x^) + biixi — Xj) + b 3 (x 3 X 3 ) (14.78a) 

where 

a = y — 5,x, — biX^ — 53 X 3 

In order to find the distribution of bi, we first use Eq. (14.74) and the 
definition of gi (i = 1, 2, 3) in Eq. (14.72) to write 

bi = c „2 - j') + c ,2 2 ~ X3)(y^ - y) 

V U 

+ Ci3 2 - X3)(yu - y) 

u 

or 

5i = 2 [cuC^u - X) 4- C,5(X2U - ^ 2 ) + C,3(X3„ - x)]y„ (14.79) 

U 

since 

2 - ^ 1 ) + C,s(x 2 „ - Xj) + C, 3 (X 3 „ - X3)]y = 0 (14.80) 

U 

In Eq. (14.79) we see that 6, is a linear combination of . . . , j'„ which are 
normally and independently distributed. Thus, when the x’s are fixed, ft, is 
also normally distributed. The mean of 6, may be written as 

2 [Cu(^l« - ^1) + C,2(X2„ - X2) + C,3(X3„ - X3)] • (ot + /S,x,„ 

U 

= 2 “ ^ 1 ) + C,2(X2„ - Xs) 4- Cn(x3„ - X 3 )] 

V 

+ / 9 i(cu«u + c, 2 a ,2 + 0,30,3) 

+ ^2(0,1021 + C ,2022 + 0,3023) 

+ ^3(^1103, + 0,2032 + 0,3033) 

or 


Mft. = /S, 

by applying Eqs. (14.80) and (14.77). Since the y’s are independent, the vari- 
ance of bu if we use Theorem 6.1, is 


o’ft, — 2 “ ■^ 1 ) + Ci2(x2u — X 2 ) + c,3(x3„ — ^3)]^o’j;„ 
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Using the assumption of equal vanances and properties of algebra, we may 
write the variance of A| as 

<rl, = ^'lc„(c„a„ + e„a,t + f„a„) 

+ Cif(C|iat, + c„a„ + e„a„) 

+ e„(c„er„ + + Ciifl'n)! 

or, on substituting the relations of (14 77) as 

In a similar way we can show that b, is distnbuted normally with mean fft 
and variance e„a-’(( =2. 3) Bu(i,.^,.andb,arcnotgeneralIyindcpencIentIy 
distributed, since it can be shown that 

cov (A|, A,) » cijo'', cov (A„ A,) « cov (At. A,) = f„<T* 

If, as IS ordinarily the case is unknown, one uses the unbiased 
estimator 

2 - v.y 

where 

Xty- - 3i)* = SSy - g,A, - g,A, - f,A, 


Then it follows that 


(/» — 4) 


(14 81) 


IS distributed as x’ with n — 4 degrees of freedom, and 

(I = 1,2.3) (1482) 

J»uVC„ 

is distributed as the Student i with n - 4 degiets of freedom Then (14 82) 
may be used to test any hypothesis of the form where /?,« is any 

constant The 100 (I — a) per cent symmetric confidence interval of is 
given by 

bi — t«i(« — 4)s, <h, V l.„(n - 4)5, „v7„ (14 83) 

It can easily be shown that 

2 ft** 


(14 84) 
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is distributed as Fwith three and n — 4 degrees of freedom, and that (14.84) 
is the appropriate statistic to use in testing the common hypothesis 

Ho : ~ = ^3 = 0 

However, a test involving two /S parameters is not straightforward. For more 
information on this topic, see Exercises 14.14 and 14.15 and Refs. [5, 6, 9, 
32, 36, 59]. 

Example 14.4. Use the data in Table 14.6 to illustrate applications of 
multiple linear regression, (a) Find the linear equation for regression of y 
on X| and Xj. (b) Test the hypothesis /S, = /Sj = 0 at the five per cent level, 
(c) Find the inverse matrix, (d) Test the hypothesis = 0 at the five per 
cent level, and find a 95 per cent confidence interval for /S,. 

Table 14.6 show's only a fraction of the observations which were made. 
According to Gingrich and Meyer [31, p. 140], “A total of 93 one-fifth acre 
plots were randomly located on 1 : 12,000-scale photographs of upland oak 
stands in Centre County, Pennsylvania. The photographs were taken on 
infrared film with a minus-blue filter. Contact prints were made on semi-matte 
paper.” After carefully locating each plot, a two-man crew measured at chest 
height the diameter of every tree equal to or greater than five inches. Standard 
tables were then used to estimate the gross cubic-foot volume, of the wth 
plot. The stand height, x',u, is the mean of the three tallest trees in plot u. 
The relative crown cover, x^n, is the mean of six measurements made by three 
different observers using a standard crown diameter scale. Several other 
ground and aerial measurements were made, but we study only these three 
in order to determine how gross volume may be estimated from stand height 
and relative crown cover obtained from aerial photographs. (Since only a 
fraction of the available data is used to illustrate techniques in multiple 
regression, the reader should not expect the results of the example necessarily 
to be the same as those reported by the authors [31].) 

The sums, sums of squares, and sums of products are 

2 = 1946 2 = 1942 2 Tk = 1 1.302 

2 = 157,196 2 = 155,244 2 = 5,654,974 

2 = 151,133 2 = 927,352 2 = 890,387 

The sums of squares and sums of products about the means are 

o„ = = 5719.36 = SPy^ = -32.28 

O 22 = SSx^ = 4389.44 g, = = 47,604.32 

SSy = 545,565.84 = SPo,j - 12,447.64 

Thus, the normal equations are 

5719.365, + (-32.28)5j = 47,604.32 
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-32 286, +4389446,= 12.447 64 
The determinant of the coefficients is 

A = {5719 36){4389 44) - (-32 28)* = 25,103,745 56 


Therefore 

6, = 4^ = 8 340 and 6, = 4' = 2 897 
A A 

since 


_ 1 47,604 32 -32 281 

1 12,447 64 4.389 441 


a»,358.M6 20 


Ai 


[5,71936 47.604321 
1 -3228 I2.44764| 


= 72.729,201 76 


Applying Eq (14 67) gives 


a = 452 08 - 77 84(8 340) - 77 68(2 897) = -422 I 


Finally, suhstctutmg a. 6,. and 6, m Eq (14 65), we find the estimated 
regression of y on x, and x, to be 


9 = -422 1 + 8 340x, + 2 897x, (14 85) 

Equation (14 85) may be very useful in obtaining estimates of means of 
arrays of y s corresponding to any pair of values Xu x,. But one does not 
usually stop here, since it is important to know whether the linear trend is 
significant, and, if so which variables contribute most to the trend For 
example, if there is no need to take into account the regression of some of 
the variables x,. then considerably less computation is involved 
In order to test the null hypothesis 

H. 0,-ff,=O 

we hnd 

5Sreg = 2?*** = 433,069 
55res = SSy - SSreg = 1 12,497 

and, therefore 

55rcg 

22 


Since Foo 4(2, 22) = 6 81, we reject the null hypothesis at the 0 5 per cent 
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level and conclude that either or /Sj or both /Si and /Sj are significantly 
different from zero. That is, we conclude that there is a linear trend which 
depends on stand height or on crown cover or on both stand height and 
crown cover. 

Often we are interested in the contribution of a specific variable A-t-.This 
is determined by testing a hypothesis or establishing a confidence interval 
for a specific coefficient /Si. For such purposes it is convenient to have the 
inverse matrix, C, of the coefficient matrix 


A = 


0\\ ^12 _ r5719.36 
.021 ^22- . — 32.28 


-32.28" 

4389.44. 


The inverse matrix of A is particularly easy to compute, since 


Therefore 



4389.44 

25,103,746 


0.000174852 


C22 


5719.36 

25,103,746 


= 0.000227829 


, (-!)(- 32.28) 
25,103,746 


= 0.000001286 


C = 


10 -« 


174.852 1.286" 

1.286 227.829. 


Since this method of finding an inverse is not very good for higher-order 
matrices, we describe a more compact method in Sect. 14.6. 

The error variance estimate is 


Thus 


4., = 5113.49 


= a/c^„ 4., = VO.894103 = 0.9456 
To test the null hypothesis /9i = 0, we compare 


6, - /3i _ 8.340 
5,. 0.9456 


8.82 


with 1 025(22) = 2.074. Since 8.82 > 2,07, we reject the null hypothesis and 
conclude that there is a positive slope in the x, direction. Further, the 95 
per cent confidence interval for /9, is 


6.38 < /S, < 10.30 

since 

^.025(22). So. = 1.96 
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Actually, since both limits are positive, we could use this information to reject 
the null hypothesis and conclude that is positive 

Also, note that the null hypothesis = 0 is rejected at the five per cent 
level, since the statistic 




_ 2897 

"ToT? 


= 2 68 


IS larger than 1 0,5(22) = 2 074 However, if 1 6, f had been in the neighbor- 
hood of I or less, say, ue might argue that gross volume per plot depends only 
on stand height This being the case, we would require a regression equation 
of y on a:, Since 


cov (61, fcj) = c,i , = 0 006575 ^ 0 


we suspect that 6, and b, are not independently distributed (When e„(r* ^ 0 
we know that 6, and bt are not independently distributed ) In case 6, and b, 
are not independently distributed, we cannot find the requited regression 
equation by simply dropping the last term in Eq (14 65) and writing 

n - -422 I + «340x, 


The proper regression line 11 found by tgnonng the x, values and fitting y 
on m the usual manner This leads to estimates 




and 


a = 45208 - (77 84)(8 324) = -195 8 
and regression equation 

5= -195 86 + 8 324JC. 


U5 POLYNOMIAL RfGRESSION 

In many applications the regression of y on x is assumed to be a poly- 
nomial function Thus, for our model we assume that each observation j. 
IS randomly drawn from a normal distnbution with mean 

,j.= ft,.,, =a+ +/S,x: («=l, ,n) (14 86) 

Kiiivanance o*-, Vnere q is t'm: degree dl'Aiepo'iynomia'i to'oe fined Equation 
(14 86) can be viewed as a special case of multiple linear regression For if 

p>. = *5, (1=1, ,q) 

is substituted in Eq (14 86), the regression equation becomes 
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= (T + + • • • + (14.87) 

which is linear in , pg. Hence, the methods of analysis of multiple 

linear regression may be applied. A sum of products becomes a sum of 
a power of For example 

2 PiuPiM = 2 = 2 ^ 

u u u 

Thus, in addition to solving normal equations, one must also find sums of 
powers of x„ from 1 to 2q when fitting a qih degree polynomial. This means 
that the amount of computation increases rapidly as the degree of the 
polynomial increases. For this reason, along with others, it is desirable that 
an experiment be planned so that shortcut techniques can be employed. 
The method of orthogonal polynomials, described below, is very useful in this 
regard. 

On finding the regression of y on at the end of Example 14.4, we did 
not use the regression of y on x, and a'«. Instead, we started again with the 
raw data (xn,Ti), > (-^in.Tn) ^^d solved a new set of normal 

equations. In general, when we use the methods already presented, it is neces- 
sary that we start with the raw data each time we fit a regression (equation 
on some of the variables .v,, Xs, . . . , x,. However, there are methods which 
allow us, in certain cases, to use an equation already fitted in determining 
the equation of best fit on fewer or more variables. We now outline and 
illustrate such a procedure, the method of orthogonal polynomials, for 
polynomial regression when the x„ are equally spaced and only one yu is 
associated with each x„. Further details, along with applications, may be 
found in Refs. [2, 3, 5, 9, 10, 37] and the exercises. 

If the values of x are at unit intervals, it is always possible to express 
Eq. (14.87) in the form 

Vu — Cd + 0\P\u + • • • + ff'qP'qu (m = 1, . . . , m) (14.88) 

where a', /3;, . . . , are parameters, and p[, ... ,pq are orthogonal poly- 
nomials in X of degrees \, ... ,q, respectively. Two polynomials p'l and 
P'l (' ^ j) are orthogonal when 

2a>;« = 0 (14.89) 

U 

that is, when the products of the polynomials evaluated at x„ . . 1 , x„ sum 
to zero. The first three orthogonal polynomials are 


p[ = X, (x - x) 
p'z = ^^[(x - xy - 

p; = X3[(x-x)’-(x-x)3^?^‘ 


(14.90) 
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where for fixed n the Xi are constants chosen so that values o{p\ are integers 
reduced to their lowest terms if p, (/ = 1 , 2 , 3) denotes the polynomial on 
the right hand side of (14 90). other polynomials may be obtained from the 
recursion formula 

<' = 2-3. .?-!) (1491) 

If x, IS any integer so that jr„ x, + 1 . ar, + 2. + 3 arc four consecutive 

integers, it is easy to compute the values in Table 14 7 Other values of 
orthogonal polynomials are shown in Table 14 S Values ofp„ for n = 3 to 
75 and 9 = 1 to 5 are given by Fisher and Yates (271 These tables have been 
extended to /i =: 104 by Anderson and Houseman [4] 


Table 147 

Values of Polynomials when n = 4 


Consteiitive 

s-i 

9“ 

(X - 1)‘ - i 
p\ 


2 Pi IPi ^-Pi 

Pi Pt P| 

*» 



~ tV 

-3 J -J 

X, + 1 

-i 

- 

tV 

- 1 - 1 3 

X, 4 2 


- 


1-1-3 

Xo 4 3 

i 


A 

3 1 1 


Tabic 144 

Values of Orthogonal Polynomials for n - 3, 4, $, g 


Censeeutive 

Integtrt 

n - 3 

P» Pi 

n 4 

. Pi Pi P> 

ItB 3 

P, Pi Pi P< 

n - 6 

P. P, Pi P, Pi 

X 4 1 

X 4 2 

X 4 3 

X 4 4 

X + S 

-I 1 

0 -2 

1 1 

-3 1 -1 

, -1 -1 3 

1-1-3 

3 \ , 

-2 2-1 J 

-1 -1 2 -4 

0-206 

1 -t 2-4 

2 2 11 

-5 S -5 t -1 
-3 -17-3 5 

-1 -4 4 2 -10 

V 4 -4 2 10 

3 -1 -7 -3 -5 

S S S I 1 

**- ' 

2 6 

I 3 

20 4 20 

2 1 ■ 

10 14 10 70 
* * T fl 

70 84 ISO 28 2S2 
^ 1 4 TT 


The least squares estimators a,6„ , 6, of a , |?i. , 0,, respectively, 

are found by solving the normal equations 


na' + 0 + 

0 +(2p«)*I + 


+ 0 =2y. 

+ 0 =2 p.. 3’- 


(14 92) 


0 + 0 + 


+ = 'Xpi»y<‘ 
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The ease with which the solutions are found is one of the advantages of 
orthogonal polynomials. The solutions are 


a' = ^ ' = y 
n 


(i= 1, . . . ,^) 


(14.93) 


A great advantage is due to the fact that the coefficients a', b\, . . . ,b'^ 
are independently distributed. Thus, any regression coefficient b'l and the 
corresponding regression sum of squares 


b'i 2 Pi^y^ 

u 


(2 p'i^^y^f 

u 

ki 


= b?k, 




(14.94) 


may be found directly. Also, the cubic regression equation, for example, can 
be found directly from the quadratic regression equation by simply adding 
on the term bipl Further, the significance of b'l can be assessed by com- 
paring b'i^k, with the independently distributed residual mean square 

SSy - b?k^ - b^^k, - b'jk, 

« — 1 — 3 


In general, the mean squares 


b?k, 


and 


SSy - b?k, - • - • - b?K 
n - \ - i 


(14.95) 


are independently distributed. 

If the degree of the polynomial is known to be (or there is strong evidence 
that it is) q, the procedures explained in multiple linear regression may be 
applied. However, if there is doubt as to the degree of the polynomial which 
fits a set of data (with equally spaced x’s), the procedure described in 
Example 14.5 should be used. 

Example 14.5. Find the simplest polynomial which adequately represents 
the hypothetical data below 


X 

1 2.3 

4 

5 

6 

7 

8 

y 

2 7 11 

14 

16 

12 

13 

11 


This example is given to illustrate the technique. The reader can supply 
his own application. For example, one might think of this as coded data for 
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a Average rainfall m eight consecutive months 
b Density of a glass m terms of annealing temperature 
c Yield of a crop m terms of distance in which rows are spaced 
d Cost of living index over time 

e Amount of assimilation of nitrogen in terms of amount of Chile 
saltpeter administered 

The values of the orthogonal polynomials, along with the sums of 
squares k,. sums of products 

Sa'-j’- 

and regression sum of squares due to b[ are shown in Table 14 9 The analysis 
of variance is given in Table 14 10 When we look at the sequence of tests 
indicated in Table 14 10 it is clear that the quadratic polynomial adequately 
describes the data at the five per cent level Actually, the fit is significant at 
the 0 5 per cent level 


Table 149 

Compulations for Onhogoiial Polynomials 


X 

Pi 

Pi 

Pi 

P. 

y 

p> 

AiP 

PP" 

PiP 

1 

-7 

7 

-7 

7 

2 

-14 

14 

-14 

14 

2 

-5 

1 

3 

-13 

7 

-35 

7 

35 

-91 

3 

-3 

-3 

7 

-3 

n 

-33 

-33 

77 

-33 

4 

-1 

-J 

3 

9 

14 

-14 

-70 

42 

126 

S 

1 

-5 

-3 

9 

16 

16 

-80 

-48 

144 

6 

3 

-3 

-7 

-3 

12 

36 

-36 

-84 

-36 

7 

5 

1 

-5 

-13 

13 

65 

13 

-65 

-169 

g 

7 

7 

7 

7 

II 

77 

77 

77 

77 


isg 

168 

264 

616 


98 

-108 

20 

32 



1 



(2 p../.)’ 

98VI68 

108V168 

20V264 

32V616 






— 

57 17 

6943 

1 52 

1 66 


Two features of the test procedure should be noted First, two consecutive 
nonsignificant results were obtained after the last significant result This was 
done because odd-degree polynomials are very likely to be nonsignificant 
when an even degree polynomial is significant Thus, if the sequence of the 
tests had stopped after testing the cubic fit, we might have missed a significant 
result Second, the residual mean square changes with each test However, 
once a decision is made regarding the degree of the polynomial, the residual 
mean square which gives the last sigmficant result can be used to estimate 
confidence intervals or to make other tests For example, the residual mean 
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Table 14.10 

Analysis of Variance for Example 14.5 


Source of Variation 

Sum of Squares 

j 

Degrees 

of 

Freedom 

Mean 

Square 

B 

Fo5 " 

Total 

'Zyl= 1060.00 

8 

1 



Mean {a' — y) 

a:fuYlS =924.50 

1 




Residual from mean j 

SSy = 135.50 | 

7 




Linear 

57.17 

I 


■gi 

5.99 

Residual from linear 

78.33 

6 


■i 


Quadratic 

69.43 

1 

69.43 

39.0 

6.61 

Residual from quadratic 

8.90 

5 

1.78 



Cubic 

1.52 

n 

mm 

0.82 

7.71 

Residual from cubic 

7.38 

n 

iQi 



Quarlic 

1.66 

1 


0.87 

10.13 

Residual from quartic 

5.72 

3 

la 




square for the quadratic term should be used to estimate a confidence 
interval for 

The reader should note that the procedure described in Example 14.5 
is for the case where the degree of the polynomial is uncertain. Otherwise, 
the proper residual mean square can be found directly and the usual pro- 
cedure for testing and estimating applied. 

Using Eq. (14.93), we find regression coefficients to be 


a' = y= 10.750 
= 0.583 

b'. = = -0.643 

and the quadratic regression equation is 

3) = 10.750 + 0.583/?; - 0.643/?; 

where 

[p[ = 2(x - 4.5) = 2x - 9 
Ip; = (x - 4.5)= - ^ - 9 a: + 15 

Substituting Eqs. (14.97) in Eq. (14.96) and simplifying gives the following 
function in terms of .v 

V = —4.143 + 6.953 a: — 0.643a:= 

Note that by dropping the p', term in Eq. (14.96) the resulting linear regres- 
sion equation is 


(14.96) 

(14.97) 
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ij= ro 750 + 0 583(2* -9) 
or 

? = 5 503 + 1 166* 

This IS Ihc same equation which would be obtained by fitting a simple linear 
regression directly 

U6 CALCULATIONS IN MUlTIPie LINEAR REGRESSION 

As long as we fit y on one or two * variables the computations are 
relatively simple But this situation changes rapidly as the number of varia- 
bles increases For this reason we present a method uhich applies to all 
multiple linear regression and which reduces the required calculations to a 
minimum The method, due to Doolittle (17] and Gauss (29], is particularly 
Useful With a desk calculator 

In regression problems there are lengthy calculations associated with 
(I) finding the equation of best fit and (2) making statements about param- 
eters For(l)thisrequifes that we solve a system of linear normal equations 
(see Eq (14 71)] and for (2) we must usually find the inverse of the coefficient 
matrix of (1) We explain how the DoolniU'Causs method [17, 22, 29, 57] 
may be used for both (I) and (2) Fora theoretical developement of this and 
many other methods available for solving simultaneous linear equations the 
reader should study Dwyer (20] 

To simplify the explanation, we describe the case of three equations in 
three unknowns The normal equations are 


I a„b, + fl„b, + o ,6, = g, 

Oub, + Onb, + c„b, s: g, (1498) 

a„b, + a„6, + = g, 

.The coefficieni matrix ^ is symmetric, that is, a„ = a„(/ j ==: 1,2, 3) The 
Doolittle Gauss procedure applies only to symmetric matrices 

The method involves finding another system of three equations in three 
unknowns which has the same solution as (14 98) and is easy to solve If it 
IS assumed that a unique solution exists, it can be shown that sucli a system 
IS given by 


flnfi, a„b, + o„5, = g, 

fl,, ,b + 6, = g, I 

«« iibi = g, I, 


(14 99) 
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where 

■fljj.i = cfj; — d\nCiij and dij = (7 = 2,3) 

gi.i = g2- diigi (14.100) 

O 33 .IS — ®33 — dijQis — I^23-1^53.1 3.nd 1 / 23.1 — ^23.l/®22-l 
^ 3.12 = ^3 — d,3gi — dss. 1^2.1 

If the equations in (14.99) are divided by a,„ 022 . 1 . «33.i2, respectively, the 
resulting system has the same solution as (14.98). That is, the system 


b, + dj2b2 + dijb} — hi 
bz + ^23. lb} = 112.1 


(14.101) 


63 — /73.1 


2 


where 


/». 




/*3.12 


g'3.12 

'^33.12 


gives the solution of (14.98). An illustration is given in Example 14.6. 

Solving a symmetric system of linqar equations by the Doolittle-Gauss 
method is particularly attractive because it lends itself to a compact compu- 
tational form. Since the solution depends on the coefficients, we start by 
writing (14.98) in synthetic form as 


Gn 

012 

Oi3 

gl 


021 

O 22 

6li3 

ga 

(14.102) 

0,1 

^32 

031 

gl 



where the unknowns 5,, b 2 , ft, are omitted and the equality marks are re- 
placed by a vertical line segment. Next, the coefficients of the two linear 
systems (14.99) and (14.101) are meshed and arranged below the coefficients 
of (14.102), as shown in Table 14.11. As is illustrated in Example 14.6, this 
allows a convenient computational pattern to be utilized. 

Example 14.6. Use the Doolittle-Gauss method to solve the simple linear 
system 


5ftj -|- Sft, q- 2ft3 = 5 
3ft I q. 4ft2 q- ft, = — 2 
• 2ft I q- ftj q- 3ft, = 9 

The solution is given in Table 14.11. The first three rows are the coef- 
ficients in the system. The fourth is the same as the first row. The fifth row 
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IS obtained by dividing each number m the fourth by the leading number 5 


Table 14 11 

Synthetic Doolittle'Gausa Method for Solvms Linear Equations 



The numbers in the sixth row are given by 

4 - 3(0 600) = 2 200 
1 - 2(0600) = -0200 
-2 - 5(0600)= -5000 

The seventh row is found on dividing each of these numbers by the leading 
number 2 200 The numbers jn the eighth row are computed as follows 

3 - 2(0 400) - (-0 200)(-0 09l) a 2 182 
9 - 5(0400) - (^5000)(-009l) =:6 546 

Dividing these numbers by 2 182 gives the numbers in the ninth row Using 
the ninth seventh and fifth rows, respectively we find the solutions to be 

b, = h, „ _ 3000 

= Ar. - d„ ,b, = -2 273 -(-009J)(3 000) = -2 000 
b, = b, - d,,b, - rf|,b, = I 000- (0 600)(-2 000) - (0 400)(3 000) 

= 1 000 

Since the values are found m reverse order, b, being used to find b, and b, 
and bj being used to find b„ we term this the back solution for a set of linear 
equations With a little practice the reader should become vefy proficient 
with this pivotal method 

Example 34 7 Find the inverse matnt of the coefficient matrix in 
Example 14 6 That is, find the inverse of 
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rs 3 21 


A = 


3 4 
2 1 


1 

3 


The columns of the inverse matrix [see (14.76)] are found by solving three 
systems of linear equations like (14.77). In each case A is the coefficient 
matrix of the unknowns. Thus, the calculations of three synthetic Doolittle- 
Gauss solutions can be meshed into one as shown in Table 14.12. 


Table 14.12 

Doolittle-Gauss Method for Finding the Inverse of a Symmetric Matrix 


Coefficient Matrix 

Identity Matrix 

Row Sum 
Check 

5 

3 

2 


0 1 

0 

11 

3 

4 

1 


1 1 

0 

9 

2 

I 

3 

0 1 

1 

■■i 

I 

7 

5 

3 

2 

1 1 

0 i 

0 

11 

1 

0.600 

0.400 

0.200 j 

0 1 

0 

2.200 


2.200 

-0.200 




2.400 


1.000 

-0.091 




1.091 



2.182 

WBBSai 

-0.091 1 

1.000 




1.000 

-0.208 i 

1 

-0.042 j 

0.458 

KB 

0.458 

-0.292 

-0.208 

1 

1 



-0.292 

0.459 

0.042 


1 



-0.208 

0.042 

0.458 






The computations in the first nine rows of Table 14.12 are carried out 
just as in Table 14.11. The entries in the row sum check column after the 
fifth are obtained in two ways. For example, the sixth entry is given by 

2.200 + (-0.200) + (-0.600) + 1.000 = 2.400 
or 

9 - 11(0.600) = 2.400 
The seventh entry is 


or 


1.000 + (-0.091) + (-0.273) + 0.455 = 1.091 


2.400 

2.200 


1.090 


Since a pair of answers are the same to three significant figures, we consider 
the check indicates that no errors have been made in computations through 
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the seventh row This simple check is one of the good features of the 
Doolitile-Gauss method It protects against most errors made in computa- 
tion 

The inverse matrix C, rows 10. 11, and 12 of Table 14 12, is symmetric, 
since A is symmetric Thus, the first row and first column of C can be found 
by using the first four columns in Table 14 12 m exactly the same way that 
the four columns were used in Table 14 1 1 That is 

= c„ = -0 208 

c„ = = -0273 - (-0 091)(-0 208) = -0 292 

c,, = 0200 - (0400)(-0208) - (0600)(-0 292) = 0458 

Using the same method, replacing the fourth column by the fifth and sixth, 
respectively, we find that 

c„ =s e„ ss -0 042 

c„ = 0455 - (-009I)(0 042) = 0 459 
fj, « 0 458 

This IS sometimes called the back solution method [15) for finding the inverse 
The student should practice the procedure until it becomes thoroughly 
familiar to him 

The back solution method has the obvious disadvantage of requiring that 
the numbers in the inverse be found sequentially However the numbers in 
columns 4, 5 and 6 of Table 14 12 can be used in a simple direct method (see 
Exercise 14 26 and Ref (20}) to find any number of the inverse without using 
othets Foi large matrices enher the abbreviated Doolittle method 120, 57] 
or the square root method (8. 19,23] should be applied Of course where 
electronic machines are available the complexities of computation may be 
avoided 

U 7 REGRESSION FOR OTHER CASES 

In the earlier section of this chapter we discussed regression only for 
cases in which 17,, the array mean, is a linear function of the parameters, 
regression coefficients We assumed the random variables to be 

normally distributed with means and homogeneous variances i. == a* 
Now, we look briefly at some other cases 

We first note that the usual procedures of linear regression may be carried 
out in cases where the uth array variance <rl is given by 
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ff® being unknown and \\\ known constants. Actually, in simple linear regres- 
sion »’k is usually taken to be a known function of x. It is left as an exercise 
for the student to find the expression for and distribution of a, b, and 4 /i„ 
for a simple regression model. 

In cases where 77 = £(ylx) is not a linear function of x, but is a linear 
function of the parameters, we may still use the methods of linear regesssion. 
For example, in the two-parameter case, if k(}f) is normally distiibuted with 
mean 


>civ) = £ 


p(30 l 

-gW- 


= a" + /9"g(x) 


(14.104) 


and variance 


V 


r^i 

-^( 4 - 


(14.105) 


the usual theory applies. That is, the scatter diagram of points (g(x„), 
in the rectangular co-ordinate system with horizontal axis g{x) and vertical 
axis k{y) fall along the true straight line whose equation is given in Eq. 
(14.104). Thus, to find a simple regression curve by the usual indirect pro- 
cedure, one transforms each data point (xa.ju) to (g(x„), ^(yO), finds estimates 
fl" and b” of a" and /9" in the usual way, and then solves the estimating 
regression equation 

kiv) = -H b"g(x) 


for rj. The resulting equation is not generally identical with thb. one that 
would have been obtained had we proceeded in the direct and mor^compli- 
cated way of Sect. 14.2. The estimators of the regression coefficients obtained 
in this way do not generally have the desirable properties of estimators ob- 
tained in the direct way. But since the difference is usually not large, the 
indirect approximate method is normally preferred due to simplicity of 
application. The student should recognize that such a device allows an 
investigator to obtain fairly good estimates of scatter diagrams of many 
shapes. (Of course, the investigator’s theoretical knowledge of the underlying 
structure is of primary importance in making a decision about the model 
to use.) A few functional relationships which satisfy condition (14.104) are 
given in the next paragraph. The reader who is interested in applications may 
Work the exercises or consult the literature for illustrations. A few references 
[33, 40, 41, 46, 48, 53, 59] are given at the end of this chapter. 

Putting g(A:) equal to x, x"‘, logs x, and b^, respectively, and k(y) equal 
^^^P^<^tively, leads to the eight equations given in Table 
•13. It is to be understood that 6 is a constant. On solving each of these 
equations for 97 we obtain the equations of Table 14.14, which have horizontal 
or vertical asymptotes in most cases and varying degrees of curvature. The 
symbol 7 denotes fe“". 
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T*M« 14.JJ 

Particular Cases of «(*) — « + s g{x) 



»(i|) V ' 

II 

? 


-L_«' + A" * 

log, n - rr- + a- X 

1 

i . 


A 



log, Jr 

— a +a~lof,x 

log, .1 « + fi log, * 

b ■* 

1 a + a- b A 

log, 1 - o' +8 Ir^ 


Equations (I) and (2) m Tabic 14 14 represent hyperbolas Eqs (5) and 
(7) represent exponential and power curves respectively When b =s e Eq 
(4) IS a special case of the very important logistic equation ’ All these 
equations serve as useful models and there are many illustrations on their 
use The interested student may first look to Refs [12 33 41 46 53 59] 
which give other references In particular the logistic curve a special case 
of a growth curve has frequently been used in studies of such things as human 
and insect populations and growth of cells and telephone subscribers There 
are also other growth curves [33 48 59] which have proved to be very useful 
in applied work 


TaUf 14 14 

Solut on of th« Equations m Table 14 13 with Respect lo e 


fU)/ 

\ umber 

■w i 

E^uat en 
Number 

«(n> - log, R 




3 

-I vfr’* 


\ 

2 

77 ^ 

6 

I * 

log,x 


3 

1 

7 ; 

n - Tx*" 

b • 

! 

4 

' a + tr b * 

s 1 

r,-yb*“ 


There are important cases where the parameters of the regression model 
should be estimated by an iterative procedure For a detailed discussion of 
such methods see Garwood (28] and others [30 34 35] Illustrations arc 
found in Refs [28 42 43] 

Other procedures are also used to estimate parameters in regression 
models Cornell (12 13] gives four methods of estimating parameters along 
with illustrations for linear combinations of exponentials Hald [33] gives 
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references to studies in time series. Askovitz [7] gives a graphic method for 
fitting a line. For other studies in regression, read Refs. [12, 30, 40, 55]. 


14.8. EXERCISES 

14.1. (a) Plot the scatter diagram for the following data 


X 

1 

1 

3 

3 

4 

6 

6 

8 

y 

8 

7 

6 

5 

5 

2 

2 


(b) Use the data in (a) to find the estimated linear regression of y on x. 

(c) Test the hypothesis that the slope of the true regression line is zero. 

14.2. Give three examples from your area of study of the use of regression. 
Carefully define all variables, suggest the functional form of the regres- 
sion equation, and indicate some uses of each e.xample 

14.3. (a) At jc -- 1 it is known that j- - >i (2, rr- - 16) and at ,v — 7, 

n(5, tr- — 16). Find the equation for the true linear regression of on 
X. Making the usual assumptions of linear regression, what can you say 
about the distribution of the array at jr -- 2? (b) The following data 
were obtained for the regression in (a) 


X 

1 

1 4 

4 

7 

8 

8 

1 

3 4 

3 

4 

5 

4 


Find the estimated linear regression of y on a:. At x’ = 8, find the true 
and estimated regression mean, the array mean, and then compare the 
true and estimated error effects for both observed values, (c) Find a 
90 per cent confidence interval for the true slope. Does the true slope 
fall in this interval? (d) Find a 90 per cent confidence interval for 
the true array mean at x = 2. Does the true array mean fall in this 
interval? 

14.4. Prove that is an unbiased estimator of cr^ 

14.5. For a sample of 11 pairs (x, /) it was found that ^x — 34, 85, 

2-’^' = 676, = 815, and 2x>’ = 326. (a) Find the three sums of 

squares in (14.52). (b) Use a five per cent level test to determine if the 
true slope is significantly different from 2. (c) Find a 95 per cent con- 
fidence interval for the common variance, (d) Test the linearity of 
regression. 

14.6. Prove Eq. (14.47) and show that s'y.j, as defined in Eq. (14.48) is an 
unbiased estimator of a--. 

14.7. Derive Eq. (14.52) starting with Eq. (14.54). 

14.8. Show that S5fP/(/i — /c) is an unbiased estimator of o--, and that 
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SSDKk — 2) IS an unbiased estimator of provided that the true re- 
gression IS linear 

14 9. Prove that 4 > >■ as defined in Eq (14 66) is an unbiased estimator of v* 

14 10. Prove the Gauss-Markoff theorem 

14 11. The data in Table 14 15 on muhiple linear regression are chosen so as 
to restrict the amount of computation (The reader may think of these 
as coded data which relate price of an item to cost of raw material and 
cost of labor, death rate of adult males to percentage of fat calories m 
dietandvalueofhome.gradesonaparticularcoursetol Q andamount 
of time spent in study, reciprocal of tar content of a gas stream to gas 
inlet temperature and rotor speed, etc) (a) Emd the linear regression of 


Table 14 IS 


Oh/tet 

*i 

Xt 

y 

1 

1 

I 

3 

2 

1 

2 

4 

3 

2 

2 

3 

4 

2 

3 

5 

5 

2 

4 

6 

6 

3 

2 

5 

7 

3 

3 

4 

8 

3 

4 

7 

9 

4 

4 

8 

10 

4 

5 

9 

11 

5 

3 

7 

12 

5 

S 

10 


y on X| and jr, (b) Test the hypothesis ff, = ff, =0 at the five per cent 
level (c) Find the inverse matrix (d) Test the hypothesis = 0 at the 
five per cent level (e) Find a95percentconfidenceinterval for 

14 12 Derive Eq (14 70) 

14 13 Show that cov (t„ b,) = c,,€r’ in multiple linear regression of y on x„ 
Xi, and X, 

14 14 In a study with 25 data pomtsofUicfonn(y,x„x,,X|)it was found that 

2*1 = 2189 = 89,341 

2 *« = 158 2 * 1*1 = 18.225 

= 14 J 2*1*-, = is.oso 

25’= 1418 2*i5'= 140.444 

2*1 = 327.350 2*1*1 = 4593 

2*1 = 24,781 2**P= 44.783 

2*1 = 877 2*15’ = 8527 
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(a) Find the linear regression of y on at,, x^, and Xj. (b) Find the linear 
regression of y on a:, and X2; of >’ on Xi and atj ; of on X2 and aTj. (c) Find 
the linear regression of j” on a:,; of ^ on Xj; of y on X3. (d) Find the 
inverse matrix for part (a), (e) Test the hypothesis yS, = /Sj = /Sj = 0 
at the five per cent level, (f) Use (a) and (d) to establish 90 per cent 
confidence intervals for / 9 i, /So, and ^3, respectively, (g) Test the 
hypothesis ^2 = ^3 =0 at the five per cent level. 

Hint. To find the regression sum of squares due to 6, and 63, RSS(b2, 63), 
subtract the regression sum of squares due to fitting y on Xj, RSS(b[), 
from the regression sum of squares due to fitting 5 ,, and h,, 
RSS{b„ hj, bs). It can be shown that the expected value of 

RSSibi, bn, hj) - RSS(b\) = RSSfbt, 63) 
is <r^ plus a function of ^2 and ySj only. Thus, under the null hypothesis 

RSS(b2,b3) 

3 - 1 

■*!/123 


Table 14 . 16 * 


Round 
Number ' 

Xi 


*3 

y 

1 

55 

65 

177 

-188 

2 

74 

45 

139 

-113 

3 

97 

60 

141 

-59 

4 

168 

75 

60 

276 

5 

126 

98 

71 

232 

6 

111 

82 

113 

114 

7 

113 

75 

137 

47 

8 

45 

70 

131 

-155 

9 

79 

43 

136 

-101 

10 

81 

77 

89 

4 

11 

92 

45 

178 

-147 

12 

114 

74 

104 

108 

13 

77 

55 

144 

-76 

14 

64 

69 

100 

-22 

15 

127 

73 

115 

-11 

16 

159 

61 

112 

113 

17 

85 

78 

135 

-45 

18 

96 

61 

97 

69 

19 

103 

74 

134 

-61 

20 

158 

63 

77 

85 

21 

111 

62 

141 

72 

22 

104 

81 

200 

116 

23 

83 

51 

143 

-67 

24 

95 

52 

113 

-69 

25 

91 

66 

141 

-148 


Data by Courtesy of Paul C. Cox. 
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IS distributed as F with two and 21 degrees of freedom 

(h) Test each of the f<^)owiog hypotheses at the five per cent level 

H, ^ i8. = 0 

ff» /S. “A-o A. A=*0 

(i) Lettingy t x andjrtbcparticulafvariablestn your area of research 
wnte a summary statement for the experiment of this exercise 

14 15 After the propellant weight propellant temperature and total weight 
were recorded 25 rockets were fired ai a fixed target The range error 
that IS the comp<Hient of miss distance parallel to the straight ime 
determined by the target and the launcher was then measured Table 
Table 14 16 gises the propellent weight minus 2000 Jh, the propellant 
temperature in centigrade units the total weight minus 5700 lb and the 
range error in yards and they arc designated by x, x, and ) re 
spectisely Use these data (o find solutions required in Exercise 14 14(a) 
(b) (c) (d) (c) (f) <g) (h) andd) 

14 16 Find the polynomial which adequately represents the following hypo- 
thetical data 


r ) 1 2 3 4 5 6 7 8 

y II 1} 12 16 14 11 7 2 

1417 Derive the orthogonal polynomials m Eq (14 90) 

Hint Lee 

p. c» + Ci A. + CijJrl -*■ + c,,jc', 

where / I q and u = t. n Find c$ so that 
' 0 

for every pair of polynomials where i ^ / 

14.18 Use Eq (14 91) to firidp .t for 1 = 1 2 ,6 Also, findXi.iand/v«i 

14 19 Verify the values in Table 14 8 

14 20 Derive the normal equations m Eq (14 92) 

14 21 f-ind 95 per cent level confidence intervals for each of the nonzero 
parameters in Exercise 14 16 
14*22 Find generaf expressions Ibr 

<»- 1,2 3 4 5) 


14 23 Management programs for the conservation of salmon fisheries depend 
on investigations and records of the past Among the many variables 
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studied are total runs in thousands, escapements in thousands, and 
percentage of escapement. The percentage of escapement in the years 
1894-1945 for the Fraser River Sockeye Salmon in each of the four-year 
age cycles are shown in Table 14.17. (a) Use the 52 percentages to find 

Table 14.17* 


CYCLE A 

CYCLE B 

CYCLE C 

CYCLE D 

Year 

Percentage 

Escapement 

Year 

Percentage 

Escapement 

Year 

Percentage 

Escapement 

Year 

Percentage 

Escapement 

■ 1 

44.5 


40.5 


21.8 

PI 

24.3 

I I 


1899 

11.2 

1900 

7.9 

119 

8.4 

11 

mSm 

1903 

12.5 

1904 

14.3 

lii 

14.4 

■ 1 

23.4 

1907 

19.5 

1908 

19.7 

1909 

7.3 

1910 

20.6 

1911 

23.2 

1912 

24.5 

1913 

18.6 

1914 

10.8 

1915 

16.0 

1916 

8.3 

1917 

5.9 

1918 

14.3 

1919 

20.3 

1920 

26.3 

1921 

17.4 

1922 

29.4 

1923 

34.0 

1924 

31.1 

1925 

25.3 

1926 

47.8 

1927 

31.1 

1928 

25.0 

1929 

23.1 

1930 

17.1 

1931 

25.9 

1932 

29.9 

1933 

16.1 

1934 

16.2 

1935 

33.5 

1936 

42.6 

1937 

27.1 

1938 

25.4 

1939 

25.0 

1940 

28.4 

1941 

30.1 

1942 

22.4 

1943 

23.3 

1944 1 

26.4 

1945 

22.2 


* Data taken from Table 8 of G. A. Rounsefell, “Methods of Estimating Total Runs 
and Escapements of Salmon,” Biometrics, Vol. 5 (1949), pp. 115-126. 


the polynomial which adequately represents the data, (b) For each of 
the four cycles find the polynomial which adequately represents the 
data, (c) Write a summary statement of your findings in (a) and (b) and 
indicate how this information might be used, (d) Discuss the assumptions 
of independence and normality as they relate to the data. 

14.24. Prove Eqs. (14.99) and (14.101). 

14.25. (a) Use the Doolittle-Gauss method to solve the simple linear system 

5o + 25 + 3c = 5 
2a + 45 -1- c — 12 
3a + 5 -1- 3c = 0 

(b) Find the inverse matrix of the coefficient matrix in (a). 

14.26. (a) Use the numbers in columns 4, 5, and 6 of Table 14.12 to find the 
inverse matrix. 

Hint. An element in the /th row andyth column of the inverse matrix 
may be found, after the elements of the identity matrix have been 
omitted, by multiplying term for term the top (bottom) elements in the 
i + 3 column by the bottom (top) elements in the y + 3 column and 
adding. For example, the element in the first row and third column of 
the inverse matrix is given by 
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(I)(0) + (-0600X0)+ (-0 455X0 458) = -0 208 


or by 

(0 200X0) + (-0 273K0) + (-OMSXJ 000) = -0 208 

(b) Use the method of (a) lo find the inverse matrix in Exercise 14 25(b) 

(c) For a general symmetric 3x3 matrix, prove that the inverse matrix 
IS obtained by the method of (a) 

14,27, In simple linear regression let all the usual assumptions hold except that 
of homogenous variances Let the variance of the uth array be given by 



<r* being unknown and w. a known constant Find the expression for and 
the distribution of a, b, and 4 >. for a simple regression model 

14 28 (a) Show that each of the equations in Table 14 13 can be solved for 
V to give the corresponding equations in Table 14 14 (b) Graph and 
discuss from the mathematical point of view four families of curves m 
Table 14 14 

14,29 The following dau were drawn from a family of regression curves of 
the type 

X \ to It 1 2 1 5 20 30 40 50 60 70 

y [ 382 303 260 195 115 096 082 072 070 065 

(a) Make a transformation so that the regression is linear m the para- 
meters ct" and 0 , and then find the fitted linear regression equation for 
the transformed data (b) Use the linear regression equation found m 
(a) to obtain a regression equation of the form 

a <* ♦» /» 

Check to see how well this equation fits the data (c) Without transform 
ing the data, find, if possible, the best fitting curve using the method of 
least squares 

1430 Use two methods to fit the foltowing data to a model of the type 
»7 = x' 

^ I 2 3 4 S 6 7 

y I 06 21 45 78 1 22 1 80 2 40 

14 31. For standard pieces of equipment (for example electronic) the model 
for the proportion q surviving * hours of operation is r? «= ^“*4* Use 
the data in Table 14 1 8 to find estimates of a and 0 Disucss some pos 
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sible applications of the fitted equation. What is the fitted equation in 
case a = 0? 


Table 14.18 



14.32. (a) Use the data in Table 14.19 to find an appropriate polynomial 
regression equation of density on depth of rock, (b) The densities are 

Table 14.19* 


Depth in 

Feet 

Mean Density 
in g/cm* 

Depth in 

Feet 

Mean Density 
in g/cm^ 

851-951 

2.378 

2261-2361 

2.577 

951-1051 

2.367 

2361-2461 

2.506 

1051-1151 

2.423 

2461-2561 

2.400 

1151-1251 

2.337 

2561-2661 

2.345 

1251-1351 

2.435 

2661-2761 

2.535 

1351-1451 

2.391 

2761-2861 

2.612 

1451-1561 

2.557 

2861-2961 

2.569 

1561-1661 

2.441 

2961-3061 

2.615 

1661-1761 

2.462 

3061-3161 

2.648 

1761-1861 

2.425 

3161-3261 

2.637 

1861-1961 

2.456 

3261-3361 

2.659 

1961-2061 

2.507 

3361-3461 

2.623 

2061-2161 

2.508 

3461-3491 

2.624 

2161-2261 

2.548 

1 



* Data taken from Table 2b of M. J.S. Innes, “The Use of Gravity Methods 
to Study the Underground Structure and Impact Energy of Meteorite Craters,” 
Journal of Geophysical Research, Vol. 66 (1961), pp. 2225-39. 


for fragmental rocks located at the Brent Crater in the Canadian Shield. 
The crater has characteristics which strongly suggest meteoric origin. 
Discuss the data and regression analysis in terms of this incomplete 
information. 


J^-9. CORRELATION AND ITS RELATION TO REGRESSION 

In certain investigations both values of a pair (a’, t) are random — the 
X value is not controlled as in simple linear regression. For example, both 
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the grade in history and the grade in algebra of college students may be 
assumed to be random, or both the temperature and the humidity on a given 
day may be assumed to be random In such cases, we think of sampling from 
a bivariate distribution When the random variables x and y are not inde- 
pendently distributed, we think ofthc degree of association as being measured 
by the correlation coefficient p, which is defined m Eq (3 58) as 



In order to relate correlation to regression, we require the density function 
of y for a given x According to Definition (5 62) the conditional density 
function of y for a given x is 

(14106) 

where /(j;, y) denotes a bivariate density function of x and y and /(x) the 
marginal density function of x 

Since we assumed the arrays in regression to be normal, »e consider the 
case where /(x, y) is the bivariate normal density function given in £q (3 57) 
It can be shown (see Exercise 14 34) that y for a given value of x is normally 
distributed with mean 


and variance 




Ofi. = <rl(l - p‘) 


(14 107) 
(14 108) 


From Eq (14 107) it is clear that the means of the conditional distributions 
(array distributions) fall on a straight line From Eq (14 108) we observe 
that the variance is constant and does not depend on x, and p' lies between 
0 and 1 Thus, the assumptions of simple linear regression of y on x are satis- 
fied That IS, the straight line on which the means of the conditional distri- 
butions fall IS a regression line Hence, on comparing Eq (14 107) with 
Eq (14 69) when 0, = 0, we see that 


(14 109) 

where 0 , , denotes 0t Since in a bivanate normal distribution x is a random 
normal variable when y is fixed, we can show, in a similar way, that the slope, 
of the regression of x on y js related to the correlation coefficient by 


(14 110) 
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Further, the correlation coefficient is the geometric mean of the slopes of 
the two regression lines; that is 

P — '^^V/X^X/V 

■Now, we observe ^see Eq. (14.108)] that when p = 1, the variation about 
the regression line (14.107) is zero. But this is also true of the variation about 
the regression of x on y. This means that all points fall on the regression line. 
Thus, when p = 1, it follows that the two regression lines are the same, and 
the two-variate distribution actually is a one-variate distribution, since x 
and y are linearly dependent variables. When p = 0, the variables are inde- 
pendent and the regression lines are parallel to the co-ordinate axes. Finally, 
when 0<p<I, both regression lines have positive slopes; when 
-I <p<0, both have negative slopes. Further, using Eq. (14.108), we 
observe that 

l-% (14.111) 

can be used as a measure of the degree of dependence of x and y. In a similar 
way, we also observe that 

1 - (14.112) 


can be used as a measure of the degree of dependence of x and y. That is, 
the degree of dependence may be measured by the correlation coefficient 
squared, by using the ratio of the variance about the regression of >> on x 
to the variance of the marginal distribution of y, or by using the ratio of the 
variance about the regression of x on to the variance of the marginal 
distribution of x. / 

As an estimator of p, we define the sample correlation coefficient of x 
and } by 


r = 


2 (x„ - x)(y„ - y) 

(u - 1) 

- ^)" - y)^ 


(« - i)= 


which may also be written as 


(14.113) 


= 2 -y) 

’’ ^2 “ ^)" 2 (3'u - yf 

or 

r 

VSSxSSy 
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whtre 


SP= '^x,}. 


n 


ssx = 2 




5Sy= 2/. 


i'£y^Y 

n 


The sample correlation coefficient r defined m this way is an unbiased 
eslimalor of the population correlaiion coefficient p only when p ^ 0 Still, 
for reasons beyond the scope of this book, r is usually considered the best 
estimator of p 

It tb easy to see that the numerical value of the sample correlation 
coefficient does not depend on the unit of measure of either x or y For, 
if He let 


I - c,x + c, 
•’ = + O 


(14 115) 


where Ct Cj, Cj o« are any constants except c, 0 e, st 0 it follows that 
the correlation coefficient in terms of * and h is 


_ 2(* - 

-/2(*' “ ^)* “ i*) 

S - C x)(<-,y - CQI) 

or 

(14 116) 

Since the correlation coefficient does not depend on either scale of measure- 


lable 14 20 

Data from Table 14 J Transformed 
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ment, we may use it in much the same way in which we applied the coefficient 
of variation when we were discussing a single variate. Also, as can be observed 
in Example 14.8, the computations are reduced considerably. 

Example 14.8. Code the data in Table 14.3 and find the sample correla- 
tion coefficient. Also, discuss the relation of r to the sample regression coef- 
ficient by/x = b. 

Table 14.20 gives the coded data along with the required products and 
totals. Thus, we obtain directly 


SPvw = 86 - = 86.5 

SSv = 23 - = 22.9 


SSw = 352 - ^ = 348.4 

r- = (86;b)^ — 0.9400 

(22.9) (348.4) 


r = 0.884 


Since the sample correlation coefficient is a relatively large positive value, 
we conclude that there is a strong degree of association between .v and y, 
or, for the regression of y on x, there is a strong dependence of y on a given 
value of -V. Furthermore, we conclude that the slope for the regression of y 
on x is positive. Thus, as a increases, y increases and does not deviate much 
from the regression line. But knowing r alone is not enough to determine the 
mean value of y corresponding to any x. For this we need the regression 
equation. (It should be understood that the statements about r require that 
both .V and y be random variables.) 

Using property (14.53) for the regression sum of squares, the sum of 
squares identity (14.52), and relation (14.114), we find that it follows that 


2 _ SP^ _ SSrsg 
SSx SSy SSy 

or 


(14.117) 


.2 _ SSy — SSres _ , SSres 
SSy SSy 


(14.118) 


From Eq. (14.117) we see that the square of the correlation coefficient is 
equal to the ratio of the regression sum of squares to the total sum of 
squares. It is sometimes called the coefficient of determination and is very 
useful in explaining what proportion of the total variation is due to regression. 
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In connection with Example 14 8, we observe that 94 per cent of the vanation 
in the y observations is due to the regression of >> on a: Since 

0 ^ SSres ^ SSy 

we conclude, using Eq (14 1 18). that 

or 

Thus, the sample correlation ranges over the same set of values as p 
Furthermore, for any fixed value of p except - 1 or 1, the sample values of 
r fall in the interval — 1 ^ r ^ I 

It should be realized that the correlation coefficient r is a measure of 
linear association or dependence If alt points fall on a straight line not 
parallel to one of the coordinate axes, r is 1 or - 1 The fact that two 
variables x and y are functionally related is not enough to insure that r s 1 
or -I Consider, forexample, the three functions 

(a) y = T + 1 

(b) y = _ 4v + 4 

(c) y = x’ + lit - 3 

Compute r tot each function when x = 0. 1, 2, 3. 4 The corresponding y 
values are shown in Table 14 21 along with sums of squares and products 
and the required totals By using Eq (14 114) it is easy to show that the cor* 
relation coefficients for (a), (b). and (c) are ). 0, 098, respectively This 
should illustrate why it is so important that the trend be linear ot nearly 
linear when the correlation coefficient r is used 


TaUe J42I 

Values for Three Functions 


0 1 

1 Funclional Valuti | 
1 (a) (b) (c) 1 


Valutt 

\ (a) (b) 

for 

(0 

1 xy Values for 

1 (a) (b) (e) 


J 

4 

-3 

0 

1 

16 

9 

0 

0 

0 

1 : 

2 

1 

0 

1 

4 

1 

0 

2 

1 

0 

2 

3 

0 

5 

4 

9 

0 

25 

6 

0 

10 

3 

4 

1 

12 

9 

16 

1 

144 

12 

3 

36 

4 

5 

4 

21 

16 

2S 

16 

441 

20 

16 

64 

Vi 




Vb ' 

1 ^ 


W 

I itb 

3ft 



If fi = 0, the sampling distribution of r is symmetric, but as p approaches 
cither - 1 or 1, asymmetry increases Also, the distribution of r depends on 
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Fig. 14.5 Distribution of r for p = 0, 0.4 and 0.8 when n = 10 

the sample size n. For small samples, r is quite variable, particularly when 
p is in the neighborhood of zero. Three graphs are shown in Fig. 14.5. 

Since the sampling distribution of r is complicated, the percentage points 
are quite difficult to compute directly. Fortunately, clue to the work of David 
[14j and Fisher [26], we can determine confidence intervals and test hypotheses 
directly with the use of charts and simple transformations. For selected values 
of n smaller than 400, David computed the distribution of r. From these 
extensive calculations David obtained charts reproduced in Table X which 
are useful in finding 95 and 99 per cent confidence intervals or in testing 
hypotheses about p. For example, suppose a sample of size 20 has a 
correlation coefficient of 0.6 and we wish to find the 95 per cent confidence 
interval for p. Using Table X, we draw a vertical line through r = 0.6 until 
it cuts the two curves corresponding to « = 20. At the points of intersection, 
draw horizontal lines until they cut the vertical axis p. The limits are found to 
be 0.20 and 0.82 so that the 95 per cent confidence interval for p based on 
n = 20 and r = 0.6 is 

0.20 < p < 0.82 

Note that this same chart can be used to test the hypothesis p = po against 
p < Po at the 2.5 per cent level or to test the hypothesis p = po against 
Po at the five per cent level, po being any constant in the interval 
- 1 < Po < 1. 

For values of n greater than 49, say, we can use a transformation given 
by Fisher [26]. He showed that the random variable 

z = y loge = arc tanh r, -l<r<l (14.119) 
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IS approximately normally distnbutcd with mean 




To find the 100 (I - a) per cent confidence interval for first use the normal 
distnbution to find a 100(1 — «) per cent confidence interval for and then 
apply the inverse transformation to obtain the interval for p That is, use 
Eq (14 1 19) to find 2 , the normal tables to find u,^ and then compute 


Letting 2 , and z, denote the lower and upper limits, respectively, m (14 121), 
and assuming that p/(2(n - 1)) in Eq (14 120) is small enough to be ignored, 
we then transform, by the use of 


to find p, and r„ which are approximate 100(1 - a) per cent confidence 
limits o( p Any standard mathematics table may be used in the transfonna* 
tions (14 119) and (14 122) To test the hypothesis p » p,, we may use the 
confidence limits as guides or we may compute 


and compare it with the appropriate standard normal u value 

In the particular important case where p = 0, it can be shown that the 
random variable 


has the Student I distribution with n — 2 degrees of freedom Thus, we can 
make an exact test of the hypothesis p = 0 Since i* = we may also use 
the F distribution with one degree and n — 2 degrees of freedom for the test 
of the hypothesis p = 0 In either case, when p actually is zero, the y arrays 
are independent (not dependent) of x But this js the same situation that 
exists in the linear regression problem when 0 — 0 Indeed, if we test 
0 = 0 and p = 0, using the t distnbution, the particular values of the 
respective statistics always turn out to be the same value for 
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SP SP r—— 

b SSx *\/^SSx r -y / n 2 ^14 1 54 ^ 

Tt, ~ j SSres — r^) v^l — 

y SSx (n - 2) 

We have only introduced the subject of correlation. It has been expanded 
in many directions. For example, Snedecor [51] describes a method for testing 
the hypothesis that several independent sample correlation coefficients are 
estimates of the same population correlation coefficient. We have studied 
only one measure of correlation. There are other measures which have impor- 
tant specialized applications. The biserial and tetrachoric measures are useful 
in such areas as public health, education, and psychology. The interested 
reader may start his studies by referring to Refs. [18, 45, 47, 54] for detailed 
discussion. The rank correlation is also very useful when one wishes to know 
whether two rankings are in substantial agreement. The reader is referred 
to Kendall’s books [38, 39] as a starting point in studies of this type of 
correlation. The degree of association among more than two variables is 
studied with the use of partial and multiple correlation. References in this 
area are [6, 18, 24, 38]. References to these and other topics in correlation 
may be found in the bibliography and reference lists of the references 
already cited in this paragraph. 

14.10. TWO OR MORE SIMPLE REGRESSIONS (CO VARIANCE ANALYSIS) 

In Chaps. 6, 8, 10, 11, 12, and 13 we discussed problems associated with 
comparing two or more means. There are times when we need to treat 
regression in a similar way. In this section we restrict our attention to some 
of the simplest problems involving the comparisons of two or more simple 
linear regressions. The statement of Example 14.9 should serve as an illustra- 
tion of the type of problem which one might wish to examine. 

Example 14.9. Suppose the penetration of different kinds of steel plates 
by SO-caliber projectiles is being studied. Suppose that five projectiles are 
fired at each of three plates. Let y',,, denote the depth to which the wth 
projectile penetrates the /th plate, and let x!„ denote the initial velocity of the 
nth projectile which is fired toward the /th plate. Let y and x denote the coded 
values of / and x', respectively. The measurements are given in Table 14.22. 
(a) Compare the penetration into the first two plates, taking into account 
the initial velocity, (b) Relate the regression analysis to the usual analysis 
of variance, (c) Compare the penetration into the three plates. (The table 
of data, calculations, and conclusions are given after the theory.) 

Other examples in which similar questions arise are in the comparison of 

L Achievements in three algebra classes when the I.Q. of each student 
is taken into account 
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2 Weight gamed by a certain type of animal using fout diets when the 
initial weight is known 

3 Tensile strength of paper tested by different processes where strength 
depends on thickness 

4 Cost of living index in different cities where the index changes with 
time 

5 Yield of a type of gram in different regions where yield depends on 
production cost 

6 Volumes of two or more gases where volume is influenced by tempera- 
ture and pressure 

7 Logarithm of sieve residue for two tube mills for the production of 
cement related to the production in tons per hour 

8 Number of a certain type of insect at different altitudes where number 
emerging depends on temperature 

Let the nth observation for the ith line be denoted by with f = 1, 

,k. u = 1, ,n, Assume, as usual, that for a fixed x,, the corres- 

ponding y„ IS normally distributed with mean 

17. » » or. /9.x. (i»l, ,k) (14 125) 

and variance 

, 9*^ =<r? (1=1, .k) 

Then, according to Sect 14 2, the least-squares estimator b, of/?, is normally 
distributed with mean 

and variance 


where 

M*. = 2(».. -*i)’ = 0 = 1. ,k) 

• ■ R. 

Further, the least squares estimatoraiofar.isnonnally distnbutedwithmean 


and variance 


F. =«. 


; =^(iH 

Vn. 


<rl 2 

•tiSSxi 


Two hypotheses which we are interested in testing are 


(14 126) 
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and 


Hoi-. iSi = =i8t = ^ 
Hoi'. oCi--'-—oCk = oc 


We first consider the special case where k = 2. For this we require a knowl- 
edge of the sampling distributions of fi, — and a, — fl?. We assume that 
the variances and ai are equal to o-^ When b, and ba arc normally and 
independently distributed, it follows that bi — is normally distributed 
with mean 


and variance 


Os.. 


D, — 


+ 


<ri 


SSx^ SSx. 


SSx) 


Thus, under the null hypothesis that j8, = fis the statistic 


b, 


'4 SSx, 




SSXi 


(14.127) 


is normally distributed with mean zero and variance 1. If the common 
population variance is unknown, we estimate it by pooling the two error 
variances 




The resulting pooled estimator is given by 

(Ui - 2) shr, -F («; - 2) sh:,, _ SSxes, + SSteSj 
(n, — 2) + (/I 2 — 2) /I] -f /I 2 — 4 


or 


where 


^2 __ ‘S’lSy'i -f SSy^ — biSSx, — blSSXj 

n, + /J2 — 4 


(14.128) 

(14.128a) 


_ (2 yt’d" 

SSy, = 2 yL - (/ = 1, 2) 

Therefore, when .5, = the statistic 

b\ b^ 



J 

(14.129) 


IS 
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Following a similar argument, we can show that, under the null hypo- 
thesis that or, = a,. 


f ~ 

^^n'sSx, 


tttSSXf 


(14 130) 


IS distributed as t with /ij -(- m. — 4 degrees of freedom In the particular 
case where 0, — l3, = 0, we find the pooled (weighted) estimator b of 0 
given by 


t SSx,b, + SSXfbt SP, ^ SP, 
SSx, + SSx, SSx, 4- ~sSx, 


(14 131) 


where 


sp, = 2 


Hi 


(!=» 1.2) 


Oi tmti tt, fey 

a, as - $x, and a, = f, - SXt 
respectively The pooled variance estimator is given by 


TaUe 14 22 

Velocity (x) and Penetranonty) Data for Example 14 9 


Item 

Smbtf 

■■■ 


1 Plaft ITI 

Totals 

1 

19 

24 

17 

33 

16 

12 ! 


2 

28 

24 

11 

33 

31 

8 


3 

1 1 ^ 

22 

3 

29 

26 

13 


4 

' 20 

26 

8 

28 

35 

25 


5 

5 

14 

21 

40 

12 

7 


Totals 

1 

110 

60 

I6S 

120 


265 340 

Central 



PartKular 





1739 

2308 ' 

924 

3339 

3262 

1031 

5925 9098 


2004 


2097 


1737 


•5838 








Calculated 
from above 
totals 

SSXi , SSyt 

294 

88 

204 

94 j 

382 

206 

1243 3 1391 3 

SP, 

134 


117 


177 


-168 7 

bi 


04338 


0 3735 


0 4634 


SSrch 


6107 


6710 


8201 

22.88 

5Sres, 


26 93 


2690 

1 12399 1 

1368 45 
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,2 ^ SSy^ + 5 ' 5>>2 - b-jSSx, + SSx,) 
III + Hi — 3 


(14.132) 


Thus, when /9, = /Ss = /9 we test the null hypothesis ai = cCi = a, using 
the statistic 




X\u 


n,SSxi ' tuSSXi 


+ 


(14.133) 


which is distributed as t with it, + ii. — 3 degrees of freedom. 

Now we return to Example 14.9. The coded data along with the totals 
are shown in Table 14.22, and the sums of squares, sums of products, slopes 
and sums of squares for regression, and residual for each plate are shown 
below the data. 

To illustrate the techniques for the comparison of two regression lines, 
we first use numerical information on plates I and II. To test the hypothesis 
j8i = jS, against the alternative /3, ^ /S, at the five per cent level, we use the 
/statistic in Eq. (14.129). Since /o’o(6) = 2.45, the critical region is made 
up of all values of / for which / < —2.45 or t > 2.45. Using the formulas 
of this section and numbers of Table 14.22, we find 


_ 8 970 

0 

•^6,-6, = + -j^) = 0.07448 


= 0.2729 

, _ 0.4558 - 0.5735 
0.2729 


-0.4315 


Since the sample / statistic does not fall in the critical region, we fail to reject 
the hypothesis of equal slopes. 

Since we have two statistics, (14.130) and (14.133), for testing the 
hypothesis = cti — a against the alternative oc, ^ Ui, we should make 
a decision, if possible, about the relative sizes of slopes and Sometimes 
we know whether or not is approximately equal to /Sst other times we 
ntay wish to run a preliminary test of the hypothesis /S, = /Sj to determine 
>f b, and hj are close enough to compute a weighted common slope b. In 
the above procedure the value of the sample statistic in absolute value is 
actually less than the two-sided 50 per cent t value. Thus, we feel safe in 
pooling the sample slopes and using (14.133) to test the hypothesis a, = az. 
For this purpose we find 


b = ^34 -t- 117 
294 4- 204 


0.5040 
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88 + 94 - 0 50400 34+ 117) 

j jjjin 

il _n, = 7 927(-fJfJ + = 16 56 

•Sa = 4 07 

a = no - (0 5040)85 ^ ,3 ^3 
, - 165 -(0^040)60 - .A 07 


13 43 - 2697 
4^7 


The lower 0 025 l value with 5 + 5 - 3 =5 7 degrees of freedom is —2 36 
Since —3 33 1 $ less than —2 36, we reject the hypothesis and conclude that 
the true intercept Oi of the regression equatio*) for plate 1 is actually smaller 
than the true intercept 

Putting the information of the two tests together, we conclude that 
Velocity appears to follow the same law, but that there is a marked difference 
m the depth of penetration on the two plates That is, m terms of the fitted 
regression lines, we may say (hat the slopes are equal but the parallel lines 
are significantly far apart This is shown in Fig 146 Theestimatedequations 
for the two plates are 



5 to 15 20 25 30 35 40 ' 

Velocity 

Fig 14 6 Regreuicm Lmes for Plates I and JI 
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= 13.43 + 0.5040X 
7], = 26.97 + 0.5040X 

The difference 

ri,-v, = 13.54 

which does not depend on x; is the best estimate of the difference in penetra- 
tion for any velocity in the range from 5 to 21. This is due to the fact that the 
pooled regression slope was used. If we had used the slopes shown in Table 
14.22, the regression equations would be 

= 14.25 -t- 0.4558X 

7), = 26.12 0.5735 X 

The difference 

= 11-87 -i- 0.1177X 

is a function of x. This brings out one of the advantages of pooling the 
slopes when possible. 

In the simple case where the value of the estimated difference, dy/^ = 
- Tji, does not depend on x, one may not be satisfied with a point 
estimate; one may require a 100 (1 — a) per cent confidence interval. In 
order to find the distribution of dy,x, we replace Oi by pi — bxt (i = 1, 2) 
so that 


(^v/x = Vi-V7 = Kj'i - bXj) + bx] - [(^2 - bXi) -f- 6x] 
or 

= Pi - yz ~ b{xt - Xs) (14.134) 

Since jpj, and b are normally and independently distributed, it follows 
that dy/:, is normally distributed with mean 8y/x ~ E{dy,^ = Vi ~ Vz 
variance 


O' 


J- + J- + 

Hi /l2 


(X, - X;)" 

SSx\ “i~ SSx^. 


(14.135) 


In case cr“ is unknown, we may replace it by and use the t distribution 
With n, + _ 3 degrees of freedom to establish a confidence interval or 

test a hypothesis about The reader should note that Eq. (14.133) may 
e used to find a confidence interval for the intercept difference a:, j- a, 
ut that Eq. (14.134) should be used when one is interested in findiijg an 
interval for 7]^ — for any value of x. In a similar way we could u/e Eq. 
( .39) to find the distribution of i), — when the true slopes /S, and^/Sj are 
1 erent. But any difference, ^,0 — ^ 2 o> would be a function of Xtj-r’That is. 
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the difference would change with x, and thus tjtt — would not generally 
be a very useful statistic 

When x„ = Xj, the true regression equation (14 125) becomes 
= «. + 

Hence, we may write the true regression equation as 

If two lines are identical, the slopes /9i and are equal to a common slope 
/3, and ij, = 17, Therefore 

Hy + fix- fix, = + fix - fiS, 

or 

H - (» IJ’) 

As an estimator of fi we calculate 

i = (14138) 

Therefore, the estimated regression equation with slope b passes through 
the points (;?i,^i) and (^f >*1) If the two theoretical regression lines are 
actually the same, the slope $ will only deviate at random from the slope, 
i, common to the two estimated regression lines Thus, the distribution of 
S — E could be used to determine if b is significantly different from B, that 
IS, to determine tf two theoretical regression lines which have the same slope 
are identical 

Now b IS calculated from the variation within sets and b from variation 
between sets Therefore, S and B are independently distributed, and the 
variance of fi — 5 is 




(14139) 


Thus, under the assumption that the two true regression lines arc identical, 
the statistic 

^ (14 140) 


is normally distributed with mean zero and variance one Ifo-* in Eq (14 135) 
is replaced by s'* and <t\ j by 4.5, the resulting statistic 

b-b 


(14 141) 
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is distributed as the Student t with n, + — 3 degrees of freedom. 

In Example 14.9 we find 


sU = 7.927 


j a+±u 1 

- 12)' \ 5 5 / ^ 294 4 


1(17 - 12 ) 


4- 204 


= 0.378 


t _ 22 - 33 _ 
17-12 


- 2.200 


-2.200 - 0.5040 
0.378 


-7.15 


0.1427 


Since —7.15 is less than (7) = —2.36, we reject the hypothesis of 
identical true regression lines and conclude that there are two true regression 
lines. This is indicated in Fig. 14.6. Note that this test does not indicate 
whether the lines differ with respect to slopes, with respect to intercepts, or 
with respect to both slopes and intercepts. It simply indicates that the true 
lines are different. 

So far we have considered special cases of covariance problems. Perhaps 
all of the discussion in this section is not necessary, but it was given in the 
hope that the reader could be led from the usual simple linear regression to 
the place where the covariance analysis can be easily generalized to any 
number of simple linear regressions. 

Now we consider the comparison of the three plates in Example 14.9 
with data and calculations in Table 14.22. If we wished to compare the mean 
penetration of the plates, ignoring velocity, we would use a one-way classi- 
fication analysis. That is, to test the hypothesis of equality of means, we 
would find a mean square for within and a mean square for among plates. 
Under the assumption of the hypothesis these mean squares are independently 
distributed. Thus, we would form a ratio of these mean squares and use the 
F table to compare with the computed F ratio. If we wish to compare the 


Table 14.23 

Analysis of Covariance for Example 14.9 



SSx 

SP 

SSy 

Degrees 

of 

Freedom 

SSk% 

SSves 

Degrees 

of 

Freedom 

Plate I 

294 

134 

88 

Bi 


26.93 

3 

Plate 11 


117 

94 

■1 


26.90 

3 

Plate III 

382 

177 

206 

B 


123.99 

3 

Among 

363.33 

-596.67 

1003.33 



23.46 

1 

Within 

880.00 

428.00 

388.00 

12 

208.16 

179.84 

11 

Total 

1243.33 

-168.67 

1391.33 

14 

22.88 

1368.45 

13 
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.meafl|>««etrflfiooef plsfts, teiihg into occoant the velocity, there is an anal* 
O 20 US test proetdure (t^hich we tm6 to the student to justify). In this 
proc^nre independent estimates of variation about the regression line are 
found chder the 8$sum{itian that the lines are the same. Hence, we find 
estimate of the error variation about regression from nithin plates and 
from among plates after taking into account the dependence of y on x. The 
necesskry calculations are shown in Table 14 23. (Since we wish to give the 
procedure for testing several hypotheses, a compact table which is useful 
for each hypothesis is given.) 

if we did not make use of the possible effect that the x values (velocities) 
might have on the mean penetrations of the plates, the Rvalue of the usual 
analysis of variance test would be 


1391.33 - 388 



12 


Since (2, 12) s 3 89, we re/ect the hypothesis of equal means, and 
conclude that we can recognize significant differences in the mean penetra* 
tion in different plates when the initial velocity is ignored. Taking into 
account the initial velocity, we compute in an analogous way, using the 
SSres and adjKent degrees of freedom eolumns, the value 

1368.45 - 179,84 

~rr 

Since /"m (2, If) = 3.98, we reject the hypothesis of equal means, and 
conclude that the mean depths of penetration m the plates differ, after 
making allowances for the differences in initial velocities. 

Bef^ore describing other tests in covariance analysis, we explain the 
computations in Table 14.23. Each degrees of freedom column is to be 
used with the adjacent column to the left. The entries, with the exception 
of degrees of freedom, in the rows labeled Plate I. Plate 11, Plate 111. and 
Total, are brought forward from Table 14.22. The first three entnes in the 
row for within groups are obtained by adding over plates. For example, 
the first entry is 294 + 204 + 382 = 880. The first three entnes in the row 
•wea.vf Av •sin'M’taf fy- snistcscCttTg v^ikiw fivrrr cw»tW' 

ponding total values, The SSteg and 55fes for both within groups and 
among means are found in the usual way. 

If J( simple linear regressions have a common variance a-*, we may, by 
generalizing Eq. (14 128), write the pooled estimator ofo-* as 
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e>2 

S) — 


2 55reS{ 
2 ~ 2A: 

i 


(14.143) 


This variance estimator represents the variation about the k fitted regression 
lines. Since it is known that 

(/=l,....fc) 

^SSXi 

is normally distributed with mean zero and variance one, then when 

u. = (&t - ^ j 


is normally distributed with mean zero and variance one. Thus, for inde- 
pendent bt it follows that 

2 u! = 2 (bi (14.144) 

is distributed as with k degrees of freedom. Generalizing Eq. (14.131), 
we find that the least-squares estimator of ^ is given by 

5 = ^ 

^SSx, 

Since 

b,- ff^{b,-b) + (b- /3) 

we may, after substituting in the right-hand side of Eq. (14.144), write 

2 ibi - /syssxi = 2 (*• - h^ssxt + (I - 2 ssx^ 

Thus, from the partition theory of the distribution, it follows that 

'Zibi-lysSXi (14.145) 

is distributed as o-’ with fc — 1 degrees of freedom 


b-0 


(14.146) 


IS normally distributed with mean zero and variance one, and that (14.145) 
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and b are independently distributed Letting 

i! = (14 m 

we have an estimator of ir* which is independent of ^ when the hypothesis 
=/3t = lB (14148) 

IS true Thus, the ratio 

4 (14 149) 


IS distributed as F with ,t — 1 and 2 «» — 2it degrees of freedom We, may 
use (14 149) to test the hypothesis in (14 148), that is. the hypothesis of 
parallel regression lines The reader should realize that this is the generaliza- 
tion of the test of the hypothesis 0i s in which the t distribution was 
applied 

To illustrate this test, we use the S5res column of Table 14 23 and note 
that 


2 (i'l - hSSx, - SSm (within) - (55res, + 5Sres, + SSns,) (14 150) 

The proof of Eq (14 150) is left as an exercise for the reader Now the 
estimates and si are 

26 93 + 2690+ 12399 17782 

S, _ ^ “ 19 76 

and 


, ^ 179 84 - 177 82 _ 
’ 2 
respectively Therefore 

__ 101 
^-1975 


Since 0 051 1 is less than F„ (2, 9) = 4 26, we fail to reject the hypothesis 
of parallelism of the true regression lines 

If the true regression lines are parallel, we may combine the two esti- 
mators i] and s' to obtain the estimator 


(2 »> - 2A:)«? + (fc - l)j| 

(2". -2*) + (it - 1) 


(14 151) 


with (2 rti — 2k) + {k - 1) = 2 "« — (* + 0 degrees of freedom Further, 
if the true regression lines are identica}, a straight line with the common 
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slope ^ passes through the k points Mv./i.)- The slope 

of this line is estimated by 


where 


✓V 


b = 


2 Mxt - jc) ipi - y) 


2 ni(xt - 


(14.152) 


2 

i u 

2^^ 


2 

and y = ■ 

i 


The regression equation of best fit for the k points (xi, _y,), . . . , (x^, jp*.) is 


= 9 + Kxi - x) 


(14.153) 


and the variation of the k sample pair of means about this line is given by the 
estimator 


^3 


- 9 - Hxi - ^)]' 


(14.154) 


Thus, the linearity of the true regression line through points (x„ Pv./j,), .... 
{Xk, My, /I.) may be tested by comparing the variance estimator ivith or 
the variance estimator given in (14.151), whichever seems appropriate. 

In Example 14.9 

2 23.46 ^-3 

= r=T2 = 23.46 


since it can be shown that 


SSres (among) = 2 nt[jj - j - h(^. - x)]=' (14.155) 

t 

Further, the variance estimator given by (14.151), which is also the one used 
in the denominator of (14.142), is found to be 


Since the ratio 


9(19.76) + 2(1.01) _ 179.84 
9 + 2 II 


16.35 


IS smaller than F „5 (1, 11) = 4.84, we fail to reject the hypothesis. In fact, 

since 1.43 is roughly the upper 25 per cent point of the F distribution, we 

ee t at any deviation for linearity in the true regression equation is not 
sizeable. 
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If the hypothesis that the k true regression lines are identical is correct, 
the slope ^ is a random normal variable with mean /S and vanance 

tf* 

^ni(k,-Jty 

Thus, It follows that an estimator of the variation of ^ - 5 about the 
weighted mean h of £ and B is given by 


Hence, a test of the identity of k regression lines is given by comparing 
sS with the vanance in (14 131), or with the variance obtained by combining 
jf, and sj It can be shown that 

SSres (total) - S5res(among) — 5Srcs (within) = (14 157) 


Thus, for Example 14 9 we have 

r5 = 1368 45 - 17984- 2346 
so that the computed /'ratio is 


116515 
16 35 


»7I 26 


116515 


Since 71 26 is larger than f»(l II) s 4 84 we reject the hypothesis and 
conclude that the three regression lines arc not identical This test is made 
under the assumptions that the regression means lie on a straight line and 
that the slopes of the regression lines are all the same If these assumptions 
cannot be made, than the appropnate test to use is the one given in Eq 
(14 142) 

In the above tests we used k = y individual regression lines, k ^ 5 
parallel lines with slope b one regression line of means with slope S and one 
regression line for all observations with slope obtained by getting a weighted 
mean, b, of $ and $ The reader should plot Katter diagrams and regression 
lines until he is thoroughly familiar with the interrelations of all these lines 
and the corresponding tests To “see** the residual in a scatter diagram, it 
IS sometimes helpful to transform the data so that the means for x and y in 
the regression equations fall at the same point on the graph It might also be 
informative to study the partition of the component parts which are indicated 
,w Abe rde/nrty 


y.. - y - - ^) = (y,. - y, - b,{x„ - ^,)] 

+ [(». - h C».. - A)1 + [#.-!- S(j. - j)l <14 158) 

+ 1(1 -*)(*.. -« + (S-4)(»..-J)l ■ 
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The identity indicates that the variation about a single over-all regression 
line is equal to the sum of the variation of the observations about k individual 
regression lines plus the variation among the slopes of the k lines plus the 
variation of the means about the regression line of means plus the variation 
for the difference between the mean slope b and the slope b of the regression 
line of means. This identity leads to a sum of squares identity which con- 
tains the components used in the above tests. 

There are other tests which can be made in a covariance analysis of a 
one-way classification, but those given in this section illustrate the kinds of 
tests which are usually made. Analysis of covariance can also be extended 
to include two-way and multiway classifications with a single linear control 
variable x, to include one-, two-, and multiway classifications with more than 
one linear control variable, and to include cases where the regression is not 
linear. 

For further study the reader is referred to Refs. [5, 9, 25, 33, 36, 37, 51]. 
A special issue on the analysis of covariance was published by the Biometric 
Society in 1957. Articles in this issue which might be of particular interest 
are Refs. [11, 50, 58]. Other references to covariance analysis can be found 
in the references already cited. 


14.11. EXERCISES 

14.33. Table 14.24 gives the height (inches) and weight (pounds) of a random 
sample of 24 college freshmen (male), (a) Find the linear regression of 


Table 14.24 


Height 

Weight 

Height 

Weight 

Height 1 

Weight 

73 

190 

69 

157 

70 

166 

71 

179 

67 

150 

64 

124 

66 

146 

71 

170 

67 

149 

67 

145 

71 

172 

73 

197 

66 

144 

65 

132 


173 


164 

73 

187 

69 

159 

72 

183 

74 

205 

73 

195 

74 

200 

71 

175 

74 

195 


height on weight, (b) Find the linear regression of weight on height, 
(c) Find the correlation coefficient, (d) Determine a 90 per cent confi- 
dence interval for the true correlation coefficient, (e) Test the hypothesis 
p = 7 at the five per cent level, (f) Write a summary statement in terms 
of your findings in (a), (b), (c), and (d). 

14J4. Let f(x, y) be the bivariate normal density function given in (3.57). 
Prove that y for a given value of x is normally distributed with mean 

~ A*) 
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and variance 

14 35. Prove that E(,r) ^ p when p^O 

1436 (a) Give the co ordiitalts (n, y) of five points which have a correlation 
coefficient of —08 (b) Leaving four of the points found in (a) fixed, 
determine the fifth point so that the correlation coefficient is —0 4 

14 3t. For a sample of sire 18 the correlation coefficient is —0 35 Find a 95 
per cent confidence interval for p 

1438. A random sample of sue 52 has a correlation coefficient of 0 23 Find 
a 95 per cent confidence interval for p, using Table X, using the arc 
tanh transformation 

1439 Let(x„>>,), ,(A,,>',){)enoteffpairsor observations LetAi. .Jr.and 
3'i. denote the ranks corresponding to observations x,, .,x„ 

and ,y„ respectively That is, x'„ ,xi (andyl, ,/,) is 
some arrangement of the positive integers 1, , n Prove that the cor- 

relation between ranks is 

» 1 _ 

nW - 1) 

Assume that no two x(or y) values are the same 

14 40 Prove that the ratio in (14 1 30) is distributed as r with + nt —4 degrees 
of freedom 

1441. Prove Eo (14150) 

14.42. Prove Eq <14 157) 

14 43. Starting with Eq (14 158). derive the sum of squares identity used m 
the lest procedures of Sect 14 10 

14 44. Plot all regression lines discussed tn Sect 14 10 

14 45. The data in Table 14 25 give the average number of parts manufactured 
per hour, x, and (he production cost per part, y, for five factories m 


Tabic 14 ZS 


CHyA 

CtIjB 

CityC 

X 

y 

X 

y 

X 

y 

11 

SI 60 

19 

SI 00 

17 

SO 90 

13 

130 

19 

050 

13 

1 40 

10 

1 so 

31 

0 70 

16 

0 70 

II 

1 30 

17 

070 

18 

060 

n 

1 50 

10 

200 

10 

200 


three cities Use the methods of Sect 14 10 to analyze these data Write 
a summary report of your findings, constructing regression lines for 
illustrative purposes 
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14,46. In a study of grades, y, 
taken into account. Use 


in three algebra classes the effect of I.O., a:, is 
the methods of Sect. 14.10 to analyze the data 


Table 14.26 


C/ass I 

X y 


99 

81 

103 

84 

108 

81 

109 

79 

96 

i 

104 

79 

96 

81 

105 

85 

94 

72 

91 

79 


Class II 

X y 


101 

85 

95 

78 

105 

93 

94 

80 

101 

83 

126 

1 95 

107 1 

90 

104 

89 

120 

97 

95 

75 

103 

79 

89 

68 


Class III 

X 

y 

108 

90 

99 

75 

126 

99 

119 

97 

109 

93 

no 

1 

105 

88 

119 

93 

128 

90 

94 

78 

103 

82 


in Table 14.26. Write a summary report of your findings, constructing 
regression lines for illustrative purposes. 
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ANALYSIS OF COUNTED DATA 


In the preceding chapters most of the discussion was devoted to the 
treatment of quantitative data measured on a continuous scale. This chapter 
is concerned with problems relating to data which occur as frequencies, 
or counts, in categories. A category may or may not fall on a continuous 
scale, and it may be either quantitative or qualitative. The binomial, 
Poisson, hypergeometric, and multinomial distributions serve as models. 
It is shown how the chi-square distribution is useful in obtaining good 
approximations to probabilities applied in goodness-of-fit and contingency- 
table problems. Simplified computing formulas are given for important cases. 

15.1. INTRODUCTION 

We have already considered the nature of models, along with a few of 
the problems, associated with counting the number of objects in each of 
several categories. In Chap. 3 the dichotomous and Poisson distributions 
were introduced; in Chap. 4 a derivation of the binomial density function 
was given, and in the exercises of Chap. 5 the density functions and some 
characteristics of the hypergeometric and multinomial distributions were 
presented. Several applications of the binomial distribution were given 
in Chap. 6, and it was shown how the computations can be reduced consider- 
ably by using the normal approximation in many cases. We have not yet 
considered problems associated with counting objects which may fall in 
more than two categories. In order to focus our attention on the nature of 
a typical problem, we first consider the familiar case with only two categories 
which are mutually exclusive. 

Example 15.1. A librarian at a college claims that 20 per cent of the 
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books are catalogued as science books Suppose that during this academic 
year 750 of 3500 new books bought are science books We wish to know 
if the proportion of science books bought this year is signiHcantiy different 
from the proportion bought dunng all the preceding years 

We assume that the “science" and “nonsciencc" categories arc mutually 
exclusive Let 0 20 denote the true proportion of science books cata 
logued before this year, and 1 — p = ^ 0 80 the true proportion of non- 

science books If books during the present year were bought in the same 
proportion as in former years then 3500(0 M) = 700 would be science and 
3500(0 80) - 2800 nonscience books We wish to know if the numbers of 
books actually bought 750 and 3500 — 750 = 2750 are significantly 
different from the numbers the librarian might expect to be brought 700 
and 2800 Clearly, this is a dichotomous situation in which the binomial 
distribution with /> » 0 20 and n 3500 shoud be applied In this example 
we wish to know the probability of a random sample of size 3500 being as 
extreme or more extreme than the one selected If the probability is less than 
005, say, we conclude that the sample proportion of science (and non- 
science) books IS significantly different from 020(0 80), otherwise, we 
reserve judgment that is we conclude that we do not have enough evidence 
to say that the true proportion is different from 0 20 

Under the above assumptions the probability of buying 750 or more 
science books is given by 

P[x = 750 75 1 3500, binomial with p » 0 20 and n « 3500] 

B- P[x ^ 749 5 normal with *5 np ss 700 and 

” ^[" ^ S 2 09) = 00183 

Since this is a two sided test we must also find the probability of buying 
650 or fewer science books Due to symmetry of the normal distribution 
this probability is the same as that given by Eq (15 1) Thus the required 
probabiljtyis2(00183) 00366 With such a small probability we conclude 
that the true proportion must be different from 0 20 Actually in this case 
we conclude that the proportion of science books bought in the Last year 
IS significantly larger than 0 20 understanding that there is a 2 5 per cent 
chance ot making a type 1 error 

To make the test more general suppose n random observations are 
Viassfiieb nfioTwo miiiurffiy exdiusise cmegories E , and C » wifn correspond 
ing true proportions p p and p = I - p - 17 Let 0, and 0 , ~ n ~ 0 , 
denote the frequencies of the sample values also called obsened frequencies 
m categories C, and C„ respectivdy, and let e and c, = /i - e, denote the 
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theoretical or expected frequencies. Now, since o, is a random variable with 
a binomial distribution, we know 


Oi — npi 
'/ripiPt 


(15.2) 


is approximately normally distributed with mean zero and variance one, 
provided both np, > 5 and np. > 5 when n is large. [If n is not large, we 
replace Oi (in 15.2) by jo, — 1/21.] Thus 


(Oi - nptY 

nPtO - Pi) 


(15.3) 


is approximately distributed as with one degree of freedom, provided 
npi > 5 and npj > 5. Further, since 


(o. - etY {ot ~ CiY _ (o, - wp,)» ^ [n - o, - n(l - Pi)]‘ 
Cl c, np, n(l -Pi) 


= (o. - np,)4- ^ 

^ ^ L npi(l -Pi) J 


(oi - np^y 

»PiPt 


it follows that 


^ (oi - e^y 

iTi c, 


= X'“ 


(15.4) 


is approximately distributed as with one degree of freedom, provided 
npi> 5(i = 1,2). The expression on the left-hand side of Eq. (15.4) is 
denoted by x‘ in most places, but we use (chi prime square) to indicate 
that it is only approximately distributed as x'. The reader will recognize 
that Eq. (15.4) has certain computational advantages over (15.2), and that 
it is preferred to (15.3) because of its symmetry in terms of frequencies. 
But the primary reason for introducing Eq. (15.4) is that it can be easily 
generalized to any number of categories. 

Suppose that all values of a population (discrete or continuous, quanti- 
tative or qualitative) fall in k mutually exclusive categories C„ Cj, . . . , Ct. 

P( denote the true proportion of values falling in category 
Ct(i - 1, . . . , k), where 

* ) 

:epi = i 

i 

In a random sample of n observations let Oi and Ci = np< denote the observed 
and expected frequency in category Ci, where 


2 Oi = 2 = « 

i i 
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Then it can be shown (see Exercise 15 18) that 

X" = 2 *”* 7/'’’ 0”) 

is approximately distributed as x* with k — 1 degrees of freedom This 
statistic IS used m testing the hypothesis that the theoretical (or expected) 
frequencies are e,, , e. against the alternative that at least one is not 

as specified The test procedure is illustrated in the following example 

Example 15 2. Suppose that in addition to the information given in 
Example 15 I we are told that 30 per cent are humanities, 35 per cent social 
science, and IS per cent general books Further, suppose that 1000 new 
humanities, 1200 new social science, and 550 new general books are bought 
We wish to determine if the proportion of new books in the four categories 
IS significantly different from the expected proportions of Pi = 0 20, 
Pt — 0 30, Pi B 0 35, and p« — 0 15, where the subscripts 1, 2, 3,4, denote 
science, humanities, social science, and general, respectively 

If the null hypothesis is true, the expected number of new books in the 
categories are 


e, = 700, e, » 1050, e, » 1225, e, » 525 
The observed frequencies are 

0. s 750. 0, = 1000. o, = 1200, o, = 550 

Thus, we find that the computed value of using Eq (IS 5) is 

(750 - 7001* . (1000 - 10501* . (1200 - 1225)' . (550 - 525V 

^ 705 1050 (200 — SB 

= 3 57 + 2 38 + 0 52 + I 19 = 7 66 

Now, when = 0 the observed frequencies are identical to the expected 
frequencies, and as x" increases, the amount the observed frequencies 
deviate from the expected frequencKs also increases Thus, the upper tail 
of th^'oV distribution with k — \ degrees of freedom is used to test the 
vSince X(o(3) = 7 82,%* = 776 barely falls in the noncritical 
region^^rhus, we do not have enough evidence to say the proportion of 
new books in the four categories differs significantly from those catalogued 
In both these examples the x* distribution has been used to determine 
wbrAW ihff Srttfi.v.vc.'fs taateh ihs 

Thus, we call this the x’ goodness-of-fit test It should be clear that appli- 
cations are almost as extensive as problems m which counting occurs For 
example, the x* goodness-of-fit test may be used in 
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1. Almost any opinion poll 

2. Checking characteristics of insects against theoretical values 

3. Comparing defective parts produced by different machines 

4. Studies of almost any human characteristic — amount of education, 
income, color of hair, occupation 

5. Basic research when only a relatively small finite number of categories 
are meaningful or when rapid results are needed 

6. Studies that affect us almost every day — traffic, weather, amount of 
sleep, health, news, working conditions 

If counts are made in two or more categories, the X' goodness-of-fit 
test may be applied, provided ti is sufficiently large. Rules controlling the 
minimum size of 0 ( vary. It appears [4, 5, 7, 14, 28, 35, 37] that as many as 
20 per cent of the categories may have expected frequencies between one 
and five, but not less than one, and still have values which are closely 
approximated by the proper %- distribution. 

It should be noted that the only assumption relating to the categories 
is that they be mutually exclusive. Otherwise, the selection of categories is 
completely arbitrary. As a matter of fact, a goodness-of-fit technique for 
one or more variables could have been applied in most of the studies already 
made. As an illustration, in Example 12.1 we could have arbitrarily defined 
the three categories to be “very acid,” “average acid,” and “low acid,” and 
have given a rule for deciding in which category a given sample value falls. 
Of course, in this particular experiment we would lose some precision, 
since acidity can be more accurately measured but we might gain something 
in time and simplicity. One of the great advantages is that we do not need 
to make normality assumptions in the %- goodness-of-fit test. Of course, 
care should be taken to avoid the areas in which mistakes in application 
are made [3, 7, 21]. 

Great caution should be used in stating conclusions resulting from the 
goodness-of-fit test. One should never accept the null hypothesis solely on 
the evidence of the test. This is illustrated in Example 15.3, and further 
discussion is given at the end of the example. 


75.2. NATURE OF THE GOODNESS-OF-FIT STATISTIC 

In Exercise 5.24 we gave the multinomial density function 


n\ 

Xil ■■■ 


where x, denotes the frequency of observations in the /th category in which 
the true proportion of observations is Pi{i — U . . . , k). It was assumed 
t at the categories are mutually exclusive, that the observations are randomly 
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selected, and that p, + . + p* — J Clearly, the multinomial density 

function gives exact probabilities for any set of observed frequencies 


X, = O,. Xt = 0|. . ,Xt — o. 

Thus, the reader, given enough tune, could compute the exact probability 
of a pattern of observations’ being as extreme or more extreme from those 
hypothesized than those observed Since the computations involved would 
be so lengthy, an approximation is very desirable 

If one repeatedly draws random samples of size n from a population 
divided into Ar categories with proportions p,, . .p,, the observed frequency 

Oi in the ith category is a random variable with mean pi and variance 
- Pi) Thus, the standardized random variable 


0 . - np, 
-7np.(l - p.) 


(I = !. 




(15 6) 


IS approximately normally distnbuted with mean zero and variance one, 
provided both «p. and /r(i - Pi) are not J«s than five and n is sufficiently 
large Now, if o,, ,o, were independently distributed, the statistic 


^ (o, - np.)» 
IT! np,(l - pi) 


(J5?) 


would be approximately distributed as x* ^’th k degrees of freedom The 
fact that the o, are correlated and are restricted by the relation ^ Oi » n 
makes it necessary that (15 7) be replaced by another statistic It can be 
shown (6, 9, 10, 11, 13, 16, 26, 30] that the appropriate statistic is 


( 158 ) 

which IS approximately distributed as x* with Ac — 1 degrees of freedom, 
provided /ip, > 5 for every i 

The use of Eq (15 8), as we have seen m two examples, requires that 
e, = fip, be known or be specified by the hypothesis, that is, requires that 
the theoretical distribution be known Usually this is not the case Instead, 
the observed frequencies are used to find maximum likelihood estimators, 
Pt, of the parameters Pi which replace the parameters in Eq (15 8) Further, 
It can be shown [8, 10, 1 1, 12, 25, 27, 29, 31. 32] that the statistic 


^ v (g« -Vi)' 
Jtf /•Pi 


(159) 


IS distributed approximately as x* Jt — I — c degrees of freedom, 
where c denotes the number of independent parameters of a distribution 
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which must be estimated in order to determine the estimators pi. This is 
illustrated in the following example. 

Example 15.3. Use Eq. (15.9) to test how well the teak tree data of 
Example 2.10 fit a normal curve. 

The type of family of distributions is assumed to be known, but two 
parameters, and o■^ must be estimated in order to determine which member 
of this family should be used as a fit for the data. In Example 3.10 the 
estimates are given as 


p, = 21.69 and = 34.5156 

and the theoretical frequencies for each interval are computed. The observed 
and theoretical frequencies for the intervals specified in Exercise 2,10 are 
given in Table 15.1, along with other values which are useful in the compu- 
tation of x"^. (If a fully automatic desk calculator is available, x’'‘ can be 
computed directly from Oj and e,, and other recordings are then unnecessary.) 

Table 15.1 

Calculations for the Teak Tree Data of Exercise 2.10 


Diameter of 

Tree in Inches 

Observed 

Frequency 

Theoretical 

Frequency 

Computations 

Oi - ej (Oi - ei)ye( 

4.5- 7.5 

8 

8.5* 

-0.5 

0.029 

7.5-10.5 

26 

22.7 

3.3 


10.5-13.5 

50 

58.3 

-8.3 

1.182 

13.5-16.5 

120 

116.5 

3.5 

0.105 

16.5-19.5 

181 

180.9 

0.1 

0.000 

19.5-22.5 

215 

217.6 

-2.6 


22.5-25.5 

213 

202.9 

10.1 

0.503 

25.5-28.5 

145 

146.7 

-1.7 

0.020 

28.5-31.5 

76 

82.1 

-6.1 

0.453 

31.5-34.5 

36 

35.8 

0.2 

0.001 

34.5-37.5 

18 

I5.9t 

2.1 

0.227 


Total 3.08 


* Includes area to the left of 4.5. 
t Includes area to the right of 37.5. 

In any case 


= 3.08 

From the distribution with 11 — 1—2 = 8 degrees of freedom, we find 
Xo 5(8) = 15.5. Since 3.08 is less than 15.5, we fail to reject the null hypothesis 
that the data were drawn from a normal population with mean = 21.69 
and variance a-^ = 34.5156. However, it would be a gross error to conclude 
that the data came from this distribution. Indeed, as indicated in Fig. 15.1, 
It could have come from any of the numerous distributions which have 
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roughly the same proportions in the vanous categories But we may con- 
clude that the data came from a distnbution not greatly different from the 
particular normal distribution specified by the null hypothesis 



Since the intervals may be selected in any fashion so long as they are 
mutually exclusive and exhaustive the experimenter (investigator) has 
unlimited choices in selecting them The areas under the curve and above 
the intervals represent the true proportions pi, ,pi in the continuous 
case (In the discrete case the proportions are the lengths of line segments ) 
Figure IS 2 illustrates these statements for the continuous case Note that 
the X* wst tnay be used if two or more categories are combined to fottti a new 



Hg 15 2 IMuslraiKHi of the Arbitrary Way in which 
Inlervab may be Selected 
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category, provided the proper adjustment is made in the nurnber of degrees 
of freedom. 

The properties for the single variable of classification generalize to more 
than one variable of classification. A very simple illustration is found in 
the Mendelian theory of inheritance [23]. Suppose that a characteristic is 
inherited in the ratio 3 to 1, written 3:1. That is, in the tong run, the char- 
acteristic is dominant in three-fourths of the progeny and is recessive in 
the remaining one-fourth. Then when two characteristics, both having the 
ratio 3: 1, are crossed, the progeny should have, according to the Mendelian 
theory, frequencies in the ratios 9 ; 3 ; 3 : 1 . That is, should have both 
dominant characteristics, ^ a dominant first characteristic and recessive 
second characteristic, a recessive first characteristic and a dominant 
second characteristic, and both recessive characteristics. The goodness- 
of-fit test for such a two-way classification is illustrated in the following ex- 
ample. 

Example 15.4. In a certain hypothetical experiment peas were crossed 
with the result that 220 were round and yellow, 78 round and green, 71 
wrinkled and yellow, and 31 wrinkled and green. Do these frequencies 
conform to the theoretical assumption that “round” is the dominant shape 
and “yellow” is the dominant color such that the true ratios are 9; 3; 3; 1 ? 

The calculations are shown in Table 15.2. Since the theory gives the 
theoretical frequencies in each of the categories, there are 4 — I = 3 degrees 
of freedom for the statistic. The critical value is Xfo5(3) = 7.81. 
Since 1.88 is less than 7.81, we fail to reject the hypothesis and conclude 
that we do not have enough evidence to say that the Mendelian theory fails. 


Table 15.2 

Computations for the Pea Experiment in Example 15.4 


Categories 

Frequency 

0( - ei 

(o< - e.F 

Observed 

Theoretical 

ei 

Round and yellow 

220 

225 

-5 

0.111 

Round and green 

78 

75 

3 

0.120 

Wrinkled and yellow 

71 


-4 


Wrinkled and green 

31 

25 

6 


Totals 

400 

400 

1.884 


The number of characteristics can be extended indefinitely, and we can 
still use the x''^ statistic as an approximation to the distribution to test 
the goodness of fit of observed frequencies against those hypothesized by 
the theory. Further, this procedure can be extended to any experiment (or 
investigation) in which theory gives the expected number of observations 
in each category. 
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75 3 TEST OF INDBPENDENCS IN TWO-WAY TABIES 

When rt observations are tabulated according to two variables of classi- 
fication, rows and columns, say, we are often interested in determining 
whether the variables are associated or not Such two-way tables of frequen- 
cies are often called eonlwgeney tables The null hypothesis subject to test 
IS that the two classifications are independent, that is. the probability that 
an observation falls in a particular row (column) is not affected by the 
particular column (row) to which it belongs If the null hypothesis is re- 
jected, the two variables of classification are said to be dependent or 
correlated The test of independence is illustrated in the following example 

Example 15.5. A random sample of 200 students in a college were 
asked the question, “Do you think scientists are slightly unbalanced people^’ 
The results are shown in Tabic 15 3 We wish to know if the proportion of 
students saying “yes" is independent of class in school, that is, is the same 
for each class 


T«Me 153 

Number of Students— Classified According lo Question Response and 
Class in School 



Frethmeii 

Sophifmare 

Junw 

Stmor 1 

Total 

Yes ! 

IS 

g 

5 

2 

30 

No 

ss 

46 

34 

32 

170 

Toul 

73 

S4 

39 

34 

200 


A statistic, X *< of Ihe type defined by Eq (15 9} is used Thus, estimators 
of cell proportions (probabilities) must be found 

If the number of ' yes ’ responses to the question is independent of class 
in school, then the proportion of freshmen saying “yes” should be the same 
as the proportion in any other class Thatis the true proportion of freshmen 
saying “yes," Pi„ should be the same as the true proportion of yes answers 
in all classes, p , Thus, an estimator of the expected number of freshmen 
saying “yes, ?ii, is the product of the number of freshmen, n, , times an 
estimator of the proportion in all classes saying “yes,"p ,, that is 



Estimators of the other expected cell frequencies are found in a similar way 
and shown in Table 15 4 For example 
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Table 15.4 

Estimates of Expected Frequencies for Table 15.3 when Variables of Classification 

are Independent 



Freshmen 

Sophomore 

Junior 

Senior 

Total 

Yes 


8.1 

5.9 

5.1 

30.1 

No 


45.9 

33.1 

28.9 

169.9 

Total 

1 73 

1 

54 

39 

34 

1 200 

1 


The totals in Table 15.4 should be the same as those in Table 15.3 except 
for rounding-off errors. This being the case, it is clear that, once three 
selected estimates, e,,, e, 2 , e,,, say, are determined by the above method, all 
others can be found by subtraction. Now we find 

_ (15 - 1 1.0)^ ^ (8 - 8.1)^ , , (32 - 28.9)^ 

^ 1 1.0 8.1 28.9 

= 4.09 

%o 5(3) = 7.81. Hence, we fail to reject the hypothesis of independence. 
That is, there is not enough evidence to say that the number of “yes” re- 
sponses depends on the class in school. 

In general, let c mutually exclusive categories of one variable (attribute) 
be arranged in columns and r mutually exclusive categories of a second varia- 
ble (attribute) be arranged in rows so that the resulting cells represent the 
cr mutually exclusive categories of joint attributes. The cell in the /th column 
and ^th row represents the joint attribute of the /th category of variable 
one and they'th category of variable two (/ = I, . . . , c;J = I, ... ,r). Let 
Pij denote the probability of an object selected at random falling in the cell 
in the /th column and yth row 

Pi = 2 To 

J 

denote the probability of an object’s falling in the /th column, and 

= 2 To 

denote the probability of an object’s falling in the Jth row. The null hypothesis 
that the two variables are independent may then be written as 

7/o:po=TiT; (/= l,...,c;y= l,...,r) (15.10) 

If n objects are randomly selected and fall in the cell in the /th column 
andyth row, then, according to Eq. (15.8), the statistic 

- 2 2 ’"' "'(IS. 1 1) 

i j nPn ^ 
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IS approximately distributed as X* with cr — I degrees of freedom, provided 
npti > 5 for every pair of I and y Under the assumption of the null hypothesis 
{15 10) the statistic m Eq (15 H) jnay be wntten as 

Y'« = ~ ^ 05 12) 


But Eq (15 12) is not very useful, since the probabilities />, and p, are 
seldom known 
If we lei 

n, = 2 ">i i = 2 

It IS easy to show that 


0 =>. .0 
(/=!. .0 


(15 13) 


are maximum likelihood estimators of p. and pj. respectively Thus, an 
estimator of the expected frequency in the cell in the ith column and yih 
row IS given by 

e„ B np, P, 
or 

l„ = ("<> /) (,= 1. ,0 (15 14) 

Since 

'Zp» = 2pi = 1 

there are only (c — 1) + (r — I) = c + r — 2 independent parameters, and 
only c + r - 2 independent cstimalors used in finding the cr estimators 
Cii, , CsT Thus, on substituting the estimator (IS M) for npt pj in 

Eq (15 12), we obtain the statistic 


or 




X 



n 


which IS distributed approximately as x* "'ith cr - 


05 15) 


(15 I5a) 
- (c + r - 2) = 
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_ i)(r — 1) degrees of freedom. For further study the student may check 
Refs. [6,9, 10, 12, 16, 18, 34]. The application of this statistic has already 
been given in Example 15.5. 


Table 15.S 

Observed and Theoretical (Estimated) Frequencies in a Contingency Table 




1 

Variable One Categories 
• • / * • 

C 

Totals 


1 

nnC^ii) 

• • • • • 

■ nci(ecx) 

«.i 

Variable 

Two 

Categories 

j 


• • • «u(eu) • • 


n.i 


r 

«lr(«lr) 

• • • «{r(^tr) • • 


n.T 

Totals 


«!. 

... m. 

"c. 

n 


Table 15.5 is given as an aid in computing x"^ and in understanding the 
relations among the observed and expected frequencies. The entries and 
computations in a problem are made in the following order. Record all 
observed frequencies and then find marginal totals. Then use the marginal 
frequencies and Eq. (15.14) to find the estimated theoretical frequencies. 
If a fully automatic desk calculator is available, can be found without 
any further intermediate recording. If such a machine is noi available, 
intermediate recordings like those in Table 15.2 should be made. In any 
case. Table 15.5 is very useful in making a rapid estimate of the significance 
of the two variables of classification. 

The test of mutual independence of more than two variables (attributes) 
is similar to the one explained for the two-variable case. We indicate th; 
procedure with three variables. Suppose variables U, V, and W are classifiel 
into c columns, r rows, and / layers, respectively, so that there are cri ce!:t 
in a three-way table. Let denote the number of objects (from amo' 
n random objects) falling in the cell in the /th column,yth row, and A^th lay 
Then it can be shown that the statistic 

I 

(If— 

t 

1) degre 

3 


2 _ 

— „2 


^//2 =222 ~ 
i } k ^ijk 


is approximately distributed as with (c — l)(r — 1) (1 
freedom, where 


than 
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W» — 2 2 
n I = 2 2 »»J* 
n . = 22"-j* 

'• = 222 «. 5 * 

The statistic (IS !6) is applied in the usual way to test the mutual independ- 
ence of the variables U, V, and W' Other hypotheses [I. 2, 17, 24, 36] may 
also be of mtcrest For example, one may wish to test whether variable 
U IS independent of K and It' 

15 4 ZPBCIAL CASBZ OF TBSTS OF INDEPBNDENCE 

If there are only two categories for one variable of classificatton, the 
computation of the statistic x' * may be simplified considerably Jf c *= r = 2, 
then Eq (15 IS) reduces to 

V • s ~ Ourtfi)* /jj 

a, ftt /I ,rt t 

which IS approximately distributed as %* with one degree of freedom, pro- 
vided each estimated theoretical frequency ^ 1,2) is large If some 
e,j IS small, the approximation is improved considerably by applying Yaiss’ 
correction (9, 37] This is done by adding j to the smallest observed frequency 
and keeping the marginal totals the same Thus, the corrected statistic, 
Xt'. IS computed by 

. - ~ '’"""I - t) (15 18) 

^ #1^ flj W jfl ) 

^ At best, the application of the above statistic still leads to approximate 
^sults Hence in some doubtful cases the exact procedure [12] should be 
plied, even though it requires considerably more calculation The method 
ilso useful m arriving at a better understanding of the inferences resulting 
Ti the use of the x ’ or xi'* statistic It can be shown that, for fixed 
ginal totals, the probability of a set of observed values m,,, n,,, 

2x2 contingency table is 

f[».„«i..n,..n.J= (15 19) 

n'nii'Bit ' rtji' riji' 

whicppiication of the exact method is given in the following example 

xample 15.6 On exposure to a certain disease 20 people responded 
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according to the frequencies shown in Table 15.6. We wish to know, at the 
five per cent level, whether the inoculation is effective. 


Table 15.6 

Results of Exposure to a Disease (Observed Frequencies) 



Attacked 

Not Attacked 

Total 

Not inoculated 

10 

3 

13 

Inoculated 

2 

5 

7 

Total 

12 

8 

20 


First, using Eq. (15.18), we find that 


,,i _ 20(110.5 - 2-3!- 10)= 
^ 12-8.13.7 


2.65 


Since 2.65 is less than %=05 (1) = 3.84, we fail to reject the hypothesis of 
independence. That is, we do not have enough evidence to say that the 
inoculation is effective. 

Applying Eq. (15.19) and the method of exact probabilities, we reason 
as follows. If inoculation makes no difference in the proportion of people 
contracting the disease, then 


(12)(7) 

20 


= 4.2 


would be attacked and 


(SKI) 

20 


= 2.8 


would not be attacked. In our example, only two people who were inoculated 
contracted the disease. Thus, only in cases with observed frequencies of 
zero or one would the number of people contracting the disease be more 
extreme. That is, only the two sets of frequencies shown in Table 15.7 would 
be more extreme from expectation than those of Table 15.6. Thus, the exact 


Table 15.7 

Sets of Frequencies with Totals Fixed 



Attacked 

Not 

Total 

Attacked 

Not 

Total 

Not inoculated 


2 

13 

12 

1 

13 

Inoculated 


6 . 

7 


7 

7 

Total 

12 

8 

20 

12 

8 

20 


probability of a set of frequencies’ being as extreme or more extreme than 
those observed is given by 




















588 


ANALYSIS OF OOUNTFD DATA 


CHAP IS 


P[]0. 3, 2. 5] + Pll 1 . 2. 1 , 61 + P112, 1. 0. 7] 



Assuming the hypothesis of independence to be true, we conclude that 
5 21 per cent of the sets (patterns) of frequencies are as extreme or more 
extreme than the set in Table 15 6 Thus, at the five per cent level, we barely 
fail to reject the hypothesis of independence 
If we had used Eq (15 IT) so that 

20(10-5 -2.3)« _^ ^3 

then we would have rejected the hypothesis of independence, in which case, 
the conclusion is different from (hat obtained by applying the exact procedure 
or Yates' correction Houever. the conclusions resulting from the exact 
procedure and the application of xi'* do not always agree This is illustrated 
in the following paragraph 
RemembeTing that 

P[X'>t] = 2Flu>^] (15 20) 

where k is any positive constant and u s Vx* is the standard normal deviate, 
we have 

PlX* > 2 65 = * 2P[u > I 63] = 0 104 

and 

P(X’ > 4 43 = X T = 2PI« > 2 10] = 0 036 

Now, if a ten per cent level test » used, the exact test leads to the rejection 
of the hypothesis of independence, but application of Yates’ correction 
leads to the conclusion that we fad lo reject this hypothesis 

In the particular case where there arc two columns (rows) and r rows 
(c columns) in a contingency table, it is easy to show that the x"' statistic 
in Eq (15 15) reduces to 

^ — Ht>ni )’ (15 21) 

which IS approximately distnbuted as x’ with r — I (<• — 1) degrees of 
freedom If wc let 
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and 




O' = 1, . . . , r) 


P = 


n 


it can be shown that Eq. (15.21) reduces to 


(15.22) 


This expression, due to Snedecor [34], is a good computational form. 


15.5. TESTS OF HOMOGENEITY OF SEQUENCES 

In the usual contingency table the n observations are free to fall in any 
of the cr cells. In other situations the frequency table may have the same 
notation and general appearance, but be quite different in interpretation. 
(An analogous situation is found in the one- and two-way classification in 
analysis of variance.) 

Suppose c samples of sizes n,., . . . , n ,., . . . ,»c are independently drawn 
from the same population and suppose the /it observations in sample i are 
classified in r mutually exclusive categories. Let Ujj denote the number of 
observations of the ith sample falling in the jth group. The resulting observed 
frequency table will look exactly like Table 15.5. However, in this case we 
think of c independent sequences of frequencies, each classified in r cells, 
rather than a single sequence of frequencies classified in cr cells. 

If Pi, . . . , Pj, . . . , pr denote the true probabilities of the population 
values falling in the r categories, then for the ith sample the probability of 
a sample value falling in the Jth category is pj. Thus, for each of the c columns 
in Table 15.5 we assume that 


1 (15.23) 

In certain investigations one wishes to test to determine whether c 
samples, each classified according to the same r categories, could have been 
drawn from the same population, that is, to determine if each of the c samples 
has true probabilities p,, . . . , p_,, . . . ,pr. It can be shown [9, 14] that the 
proper statistic to use in such a test is given by Eq. (15.15). If the hypothesis 
IS rejected, we say that the samples are drawn from different populations. 
If we fail to reject the hypothesis, we say that there is not enough evidence 
to claim that the populations are different. Thus, this is sometimes called 
a test of homogeneity of sequences. 

There are two cases of special interest. In the first, suppose that c samples. 
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eachofsizen, are drawn from binomial populations Letn,,, ,n,u ./id 
denote the number of successes m each of n independent trials Then 
n„ = n — «ii denotes the number of failures in the ilh sample The observed 
frequencies and marginal totals are shown m Table 15 8 Under the assump* 


Table 158 

Observed Frequencies for t Binomial Samples of Equal Size 


Sample 

1 ' _ . 1 

Tola! 

Successes 

"II 


"d 

"1 

Failures 

" — "ii 

A - A„ 

»-"cl 

cn-«, 

Totab ' 

" 

" 

" 

C" 


tion that each of the c samples is randomly drawn from a binomial popu 
latiop with probability of success p, it can be shown that the maximum 
likelihood estimator Ci of the theoretical frequency of success in the ith 
sample is 

= (< = I. .c) (1524) 


where n\ = n,le denotes the mean number of successes over all samples 
To test the null hypothesis that all samples are drawn from the same binomial 
population, we may use the statistic x' * given in Eq (IS IS) or its equiva- 
lent form 


X 


i(".. 



(15 25) 


The statistic in Eq (IS 25) is sometimes called the binomial index of dis 
persion 

If n IS large and p is small so that np =: pK constant, we may use the 
Poisson distribution to approximate the binomial distribution Then 
n i/n as a maximum likelihood estimator of p is small and 1 — rtijn is 
approximately equal to one Furthermore, the right-hand side of Eq (IS 25) 
r^uces to 

]§(«»-«.)* (15 26) 


which IS known as the Poisson index of dispersion (35] 

Example 15.7. In ten consecutive weeks the number of illegal left hand 
turnsat a busy stoplight were as follows 27,21, 23, 19, 18, 25, 23, 28, 24 22 
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Use a five per cent level test to determine whether the number of left-hand 
turns is nonheterogeneous. 

Assuming the number of left-hand turns per week to have the Poisson 
distribution, we use (15.26) to find 

2 («n - «..)' _ (27 - 23)= +■■■ +{22 - 23)= _ ^ 

«.i 23 

For nine degrees of freedom %=05 == 16.9. Thus, we fail to reject the hypothesis 
that the data came from the same Poisson distribution. That is, on the basis 
of these results we cannot say that the data came from different Poisson 
distributions. 

The %= test for counted data may be related to the F test for measured 
data. Many of the procedures in analysis of variance have counterparts in 
experiments with counted data. For example, instead of using the %= distri- 
bution to test the homogeneity of c binomial population means, say, we 
may use it to test hypotheses involving a single degree of freedom 
[15, 18, 19, 20, 22]. For further study the reader is referred to Refs. 
[6, 7, 9,12, 22, 33, 34]. 

15.6. EXERCISES 

15.1. Two hundred sixty persons of voting age, selected at random, were asked 
two days before a city election which of two candidates, A and B, they 
favored. One hundred forty-three favored A, and 117 favored B. Use a 
five per cent level test to determine whether the opinion in the population 
may be equally divided. 

15.2. In manufacturing a certain type of small unit it is claimed that no more 
than one per cent are defective. In a random sample of 500 units 11 
were found to be defective. What conclusion do you make concerning 
the claim? 

15.3. In 120 fair tosses of a die the following data were obtained 


Number of spots 

1 2 3 4 5 6 

Frequency 

15 27 18 12 25 23 


Is there reason to believe the die is not properly balanced? 

15.4. Suppose the number of defective units in a day’s production was tabu- 
lated by shifts with the following re.sults 


Shift 

1 

2 3 

Frequency 

19 

36 25 
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Is there reason to believe the true relative frequencies of defectives are 
the same for all shifts? 

IS^ During a three month period there were 145 machine breakdowns 
Table 15 9 gives the number of breakdowns for each machine during 
each shift Determine tf the number of breakdowns for any machine is 
independent of shift 


TaUe 159 


Shtfl 

A 

Moch Hi 

B 

C 

D 

1 

9 

5 

11 

12 

2 

9 

n 

IS 

20 

3 

12 

9 

12 

17 

Table IS 10 shows the grade distribution of three instructors who taught 

the same course for the same period of time 

Did the instructors give 

significantly different percentages of the five grades’ 




Table 15 10 



hstructof 

A 

B C 

I> 

E 

I 

20 

38 126 

18 

21 

l( 

14 

45 183 

25 

14 

lit 

38 

48 275 

24 

33 


15 7 Prove Eq (15 17) 

15 8 (a) Write all possible patterns where the marginal totals m a two way 
frequency table are those given m Table 15 1 1 (b) Find the probabilities 


associated with these patterns (c) What is the exact probability that one 
would have the following observed pattern of frequencies or a pattern 
more extreme'' 
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(d) What is the comparable probability if Yates’ correction is used? 

(e) What is the comparable probability if Yates’ correction is not used? 

15.9. Prove that Eq. (15.15a) reduces to 
v" = -1^ V 

^ niiti.y n.j 


when c = 2. 

15.10. Table 15. 12 shows the hair color distribution of boys and girls in a certain 
town. Use the formula in Exercise 15.9 to test the independence of hair 
color and sex. Make comments on your findings. 


Table 15.12 



Fair 

Red 

Hair Color 
Medium 

Dark 

Stack 

Boys 

450 

120 

1232 

435 

27 

Girls 

575 

110 

1104 

422 

35 


15.11. Prove that the estimators given in Eq. (15.13) are maximum likelihood 
estimators of Pt and p j, respectively. 

15.12. Prove that Eq, (15.19) gives the probability of a set of observed values 
«),, h, 2 , Hji, n.,^ in a 2 X 2 contingency table. 

15.13. Prove Eq. (15.22). 

15.14. Justify using Eqs. (15.25) and (15.26) in testing equality of proportions. 

15.15. Twenty boys were randomly selected from each of five high schools in 
a city, and the median height was determined for the total group of 100. 
Table 15.13 shows the number of boys in each high school with heights 
greater than the median and heights equal to or less than the median. 
Test to determine if the five high schools are likely to have boys with 
the same median height. 


Table 15.13 


Frequency 

A 

B 

High School 

C 

D 

E 


Greater than 

14 

9 

10 

5 

12 


Equal to or 
less than 

6 

11 

10 

15 

8 


Total 

1 

20 

20 

20 

20 

20 



15.16. The number of deaths by accidents in each of six large universities 
during the same academic year were as follows 


10, 5, 8, 3, 2, 12 
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Use a five per cent level test to detomine whether the number of deaths 
IS nonheterogeneous 

15.17. Four groups of students were asked the same question Table 15 14 
gives frequencies of ‘ yes“ and “no" answers (a) Determine whether there 
IS a significant difTerence in the percentages of affirmative answers for 
boys and girls , for high school and college (b) Note that this is a 2 x 2 
factorial experiment Apply the usual analysis of variance to determine 
the solution to part (a) 


TaMe 15 14 


Answer 

High School 

High School 
Girls 

College 

Boys 

College 

Girls 

Yes 

30 

IS 

50 

25 

No 

70 

83 

SO 

IS 

Total 

JOO 

100 

100 

100 


15.18 Prove that x* 8>ven in Eq (15 S) or Eq (15 8) is approximately 
distributed as x* * - 1 degrees of freedom 

Hint Derive the x* distribution from the multinomial m much the 
same way that the normal (and thus x* '*'<>>) one degree of freedom) was 
derived from the bittomial distribution That is, start with the multi* 
nomial distribution with k categories, replace factorials with Stirling’s 
approximation, rearran^ terms, and lake logarithms, expand the 
logarithmic expression tog (I -i- r). etc 

15.19 Use the literature to prepare a derivation of Eq (15 9) 

15 20 Justify Eq (15 16) and give a 2 x 2 x 2 illustration 
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DISTRIBUTION-FREE METHODS 


Some methods are presented for testing and comparing properties of 
distributions when their functional forms (or shapes) are unknown. An 
extension of the sign test, Wilcoxon’s signed test, the Mann- Whitney U test, 
and a median test are described. The problems of testing the randomness 
assumption are discussed, and one procedure is described. 


76.J. INTRODUCTION 

In the great majority of the estimation and test procedures described 
in the other chapters, we assumed the observations to be randomly drawn 
from populations with known functional form. Even though the populations 
were usually assumed to be normal, this is not a crucial requirement so long 
as the functional form is known. For, according to Chap. 3 and Theorem 
3.4, any continuous distribution can be transformed to a normal distribution. 
(Actually, in practice the true nature of the population is seldom known for 
sure, but there is much evidence that mild departures from normality do 
not usually seriously affect the conclusions.) 

When the functional form is unknown, any statistical statements made 
as a result of an experiment cannot depend upon the form of the distribution; 
that is, the statistical statements are distribution-free. For example, the sign 
test of Chap. 8 and the tests of Chap. 15 are distribution-free tests. Since 
distribution-free procedures do not depend upon knowing the forms of 
populations, one does not usually deal with parameters. That is, distributions 
are compared without the use of parameters. Hence, distribution-free pro- 
cedures are also called nonparametric procedures, even though neither is 
wholly appropriate in some cases. 
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In all procedures the assumption of randomness is basic But sometimes 
there is reason to question this assumption Thus, tests have been developed 
to determine whether the sample values are nonrandom Two such tests 
are described in this chapter 

J6 2 EXTENSIONS OF THE SIGN TEST 

In Sect 8 8 we used the sign test to compare two methods of measuring 
the percentage of starch m potatoes That is, we tested the hypothesis 
Hi = fif OT i = n, ~ n, — 0 However, m some situations one might be 
interested in hypothesizing that one method is a fixed number of units, 
So, better than the other In this case, wc would require a test of the hypothe- 
sis Hi = + So or S = So, where S« is a real number The procedure is 

simply to apply the sign test to (he signs of the differences 

- (Jfii + So), At|« - (x„ + So), ..x,n- (x„ + So) (16 1) 

If the number of positive signs, say. is significantly different from n/2 at the 
a level, we rej^t the null hypothesis that 

+ (16 2 ) 

Otherwise, wc fail to reject this hypothesis Usually there is an interval of 
values of So for which we fail to reject Eq (16 2) Such an interval is a confi- 
dence interval for the difference S - Hi - fit 

16 2 1 WJIeoxon t Signed Ronfc Tctf 

In the paired-/ test we used both the order and magnitude properties 
of the observations, in the sign lest we applied only the order properiies 
Further, the sign test does not require the assumption of normality as does 
the paired / test Wilcoxon [52. 53, 54J desenbed a test for paired values 
which uses both the order and magnitude (of ranks) properties but does not 
require the normality assumption 

Example 16.1. Illustrate Wilcoxon’s signed rank test with the following 
differences taken from the potato problem of Example 8 6 

0 2.00,00,01,02.02.03. -03,01.02,03,00. -0 1,0 1, -02,01 

After discarding differences of zero, arrange the remaining differences 
according to increasing order of the absolute value of the differences, 
that is 


01,01, +01,01,0 1,02.02,02,02, +02,03, +03,03 (163) 

Then assign ranks and attach the sign of the differences (When the absolute 
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value of two or more differences is the same, assign to each the mean of the 
ranks they would have had if all were different. Since, in our example, the 
absolute value of the first five differences is 0.1, we assign the rank 

1 + 2-l-3-t-4-)-5 1 

5 

to each, attaching a minus sign to the rank corresponding to —0.1.) It follows 
that the signed ranks corresponding to the differences in (16.3) are 

3, 3, -3, 3, 3, 8, 8, 8, 8, -8, 12, - 12, 12 

respectively. Finally, find the sum, T, of the positive or negative ranks, 
whichever is smaller, and compare with the critical value shown in Table 
16.1. If T is smaller than the critical value, reject the null hypothesis S = 0 
and accept the alternative hypothesis, S 0, that the methods are different ; 

Table 16.1 

Critical Values of T for Wilcoxon’s Signed Rank Two-Sided Test* (Absolute values of 
T less than the tabulated values occur with indicated probability. In a one-sided test the 
probabilities are 0.025, 0.01, and 0.005, respectively.) 


Pairs 

n 

Probability 

Pairs 

n 

Probability 

.05 

.02 

.01 

.05 

.02 

.01 

6 

0 

— 

— 

16 

30 

24 

20 

7 

2 

0 

— 

17 

35 

28 

23 

8 

4 

2 

0 

18 

40 

33 

28 

9 

6 

3 

2 

19 

46 

38 

32 

10 

8 

5 

3 

20 

52 

43 

38 

11 

11 

7 

5 

21 

59 

49 

43 

12 

14 

10 

7 

22 

66 

56 

49 

13 

17 

13 

10 

23 

73 

62 

55 

14 

21 

16 

13 

24 

81 

69 

61 

15 

25 

20 

16 

25 

89 

77 

68 


* This table is reproduced from F. Wilcoxon, Some Rapid Approximate Statistical 
Procedures, American Cyanamid Company, Stamford, Connecticut, 1949, Table I, p. 13, 
with the permission of the American Cyanamid Company. 


Otherwise, fail to reject the null hypothesis. Since the sum of the absolute 
value of the negative ranks is 

r=3-i-8-i-12:=23 

and the five per cent critical value of T when « = 13 is 17, we fail to reject 
the null hypothesis. Note that the same conclusion was made when the sigr 
and paired-/ tests were used. 

Application of the paired t, Wilcoxon’s signed rank, and the sign test 
to the same experimental data will not always lead to the same conclusion 










(SCX) 


OISTKIBtmON-HtEE METHODS 


CHAP 


In the case where the assumptions of the paired-r lest hold, all three tests 
are valid If an «-level two sided (or one-sided) procedure is applied, the 
paired-/ test is the most pouerful and the sign test the least powerful of the 
three In fact, it can be shown that the power efficiency of the Wilcoxon’s 
signed rank test relative to the paired-r test is near 0 95 for small samples 
and near ‘ijn = 0 955 for large samples when the true values of 5 arc near 
zero 

The signed rank lest may also be applied to test the hypothesis that 
S IS some specified value Bn (In fact in a single sample, the same test pro- 
cedure may be used to test the hypothesis that the median of a group of 
observations is equal to some specified value, say. /t# ) The general procedure 
for Wilcoxon’s signed rank test ts as follows 

1 Subtract lu) from each difference d = Xi — X| (or value x) 

2 Rank the resulting adjusted differences (values) in order of size, 
Ignoring sign In case of ties in adjusted differences (or values), assign 
the mean rank to each tied adjusted difference (value) 

3 Attach the sign of the adjusted difference (or value) to the correspond- 
ing rank 

4 Find the sum. T, of the positive or negative ranks, whichever is smaller 
in absolute value 

5 Compare 7" with the critical value r, in Table 16 1 If | T] < Ft, reject 
the null hypothesis that the mean difference (value) is£, (j/g), otherwise, 
fail to reject the null hypothesis 

Table 16 I is not adequate for more than 25 pairs In such a case it can 
be shown that 7* is approximately normally distributed with mean 


and variance 


Therefore 


_ n(n + I) 


,_«(«+ IK2n + I) 

55 


n' = 


T-Hr 

<fT 


(16 4) 

(16 5) 


is approximately normally distributed with zero mean and unit variance 
Thus, in experiments with more than 25 pairs of observations, the normal 
distribution may be used as an approximate test of the significance of the 
difference m means 


Id 2 2 The Wo/sh Test 

Walsh (45, 46] describes another very powerful nonparametric test which 
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is based on ranking the differences rft = Xu — Xsi (i = 1, , «)• In order 
for the test to be valid, it is assumed that the n differences are independently 
drawn from n symmetrical populations, each having the same mean 
(median) p. 

76,3. TESTS FOR TWO INDEPENDENT SAMPLES 

Comparisons of similar characteristics of two or more independent 
samples are given in earlier sections. In Sect. 8.6 the two-sample t test was 
used to test the null hypothesis that two population means, Pi and are 
equal. For this purpose it was assumed that random samples of sizes n, and 
n, are independently drawn from two normal populations with equal vari- 
ances. In Chaps. 10 and 11 we described extensions of this problem to 
k normal populations with equal variances. Sometimes one or more of the 
assumptions for the two-sample ( test or ^'-sample F test are not valid. Thus, 
in Chap. 15 we described methods for comparing two or more populations 
with unknown distributions. For example, Fisher’s exact probability test 
(Sect. 15.4) and the test of homogeneity of sequences (Sect. 15.5) were 
discussed. 

In this section we describe three more useful distribution-free tests which 
require that the sample values be ranked (ordered). In these tests it is assumed 
that random and independent samples are drawn from two continuous 
distributions which have the same form but possibly different values of the 
location parameter (e.g., mean or median). Thus, under the usual null hy- 
pothesis, the random and independent samples 

Xu...,x„^ and ,y„, (16.6) 

are assumed to come from a single population. The alternative hypothesis, 
expressed in terms of a difference in location parameters, may be either 
one-sided or two-sided. 

A 

76.3.7. The Rank-Sum Test 

Under the null hypothesis that there is a single population, the two sam- 
ples in (16.6) may be combined to give a single random sample of size 
a = ni -h « 2 . Arrange these n observations in order of increasing size, 
assigning score 1 to the smallest value, score 2 to the second smallest, . . . , 
n to the largest, preserving the identity of the samples. There is no loss in 
generality if we assume «, < n^. Let T denote the total of the ranks of the 
sample with n, observations. Then the smallest value of T is 


and the largest is 


1 + ... + 1 ) 


(16.7) 
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+ + (n - n, + 1) = ".f”. +2”. , + 1) g) 

The sampling distribution ofTis used to test the significance of the difference 
i;j location parameters Either one sided or iwo-sided tests may be described 
Extremely targe or extremely small values indicate that the location para- 
meters of two populations are different In order better to understand the 
nature of the sampling distribution of 7* and the use of Table 16 3, consider 
the following example 

Example 16 2 Find the sampling distnbution of T for samples of sires 
four and five 

In the example «i = 4, n, = 5. and it = 9 The ranks of the combined 
sample of nine observations are 

I.2.3.4.5.6. 7,8.9 (169) 

Since the extreme values of the sum of four ranks. T, are 
10 and = 30 


Table 16a 

Sampling Disinbution of Rank TotabTof Samples of Sue Four in 
Combmacioa with Samples of Sire Five 


Rank 
Total IT) 

Frt^utney 

Relolire 

frt^ueney 

Cuinuhtne 

Rrl Frtq 

ID 

1 

0008 

0008 

11 

1 

0008 

0016 

12 

2 

0016 

0032 

13 

3 

0024 

0 056 

14 

5 

0040 

0096 

15 

6 

0048 

0144 

16 

8 

0063 

0 207 

17 

9 

0071 

0278 

18 

n 

0<R7 

0 365 

19 

n 

0087 

0452 

20 

12 

0095 

0 548 

21 

11 

0087 

0635 

22 

11 

0087 

0722 

23 

9 

0 071 

0 793 

24 

8 

0063 

0 8S6 

25 

6 

0048 

0904 

26 

5 

0040 

0944 

27 

3 

0024 

0968 

28 

2 

0016 

0984 

29 

I 

0008 

0 992 

30 

1 

0008 

1 000 
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the possible values of T are 10, 11, , 30. We require the number of ways 
in which a specified T can be obtained. 


TTiere are 



126 different ways to select four scores from 


among the nine in (16.9); that is, the total number of ways (frequency) in 
which T is found is 126. The frequencies of specific rank sums shown in 
Table 16.2 may be obtained by an exhaustive listing of sums of four ranks. 
For example, the frequencies for T = 10, 1 1, 12, 13 are 1, 1,2, 3, respectively, 
since 1 -t- 2 -f 3 -f 4 = 10; l-i-2q-3-f5 = ll; 1 -f 2 -f- 3 -f 6 = 12; 
H.2 + 4-f5=12;l+2 + 3 + 7=13;l-f-2 + 4-t-6 = 13;l-f-3-f4 
4 - 5 = 13. This method is satisfactory for small samples, but as the 
samples become larger a better counting technique becomes more desirable. 
Such a procedure is now described. 


If we let f(T\ «i, «j) denote the number of ways of obtaining the specific 
rank total T when «, and «2 are the number of observations in two groups, 
it can be shown that 


/(T;ni, «2) =/(T;m„«2 - 1) +/(T- «;«, - l^n^) (16.10) 

For the highest rank « = «, -f «2 is either used in finding the rank total 
T, or it is not. When « is not used, the frequency of the rank sum T is given 
by/(T; til, ”2 — !)• When n is used, the frequency of the rank sum T is given 
by/(r — «; «! — 1, ris). Formula (16.10) is particularly useful in computing 
a sequence of frequency tables of T. That is, after tables have been computed 
for small and « by an exhaustive listing, (16.10) may be applied to obtain 
frequency tables for larger tii and ti. 

Table 16.2 is typical of sampling distributions of Jin at least two respects. 
The relative frequencies may be considered to be probabilities, since each 

of the ^ ways in which T is computed is considered to be equally likely. 

Every distribution of T is symmetric. Thus, the mean of the sampling distri- 
bution of T is given by 


or 




ni(«i + 1) , «i(/2, -I- 2«2 + 1)1 
2 2 
2 ^ 


+ «2 + 1 ) 


(16.11) 


and the probability of obtaining a particular value of T is the same as the 
probability of obtaining 


T' = 7j,(«j +/is + 1) ~ T 


(16.12) 
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We call T' the conjugate of T and illustrate how it is used in the following 
example 

Example 16.3. One might expect that the speed with which turtles swim 
across a tank of water to a platform on the opposite side depends on the 
temperature of the water In an experiment (hypothetical) four turtles took 
85, 210, 432, 183 sec in 33"C water and five turtles took 72, 89, 13, 56, 145 
sec in 42°C water Use the T statistic to test the hypothesis that there is no 
difference in swimming time at 33®C and 42“C temperatures against the 
alternative hypothesis that turtles swim faster in 42“C water 

First, rank the nine observations according to increasing number of 
seconds and find that the smaller sample has ranks 4, 7, 8. 9 Thus F = 28 
According to Table 16 2, the probability of the ranks’ being this large or 
larger is 0 016 + 0 008 + 0 008 «= 0032 Hence, at the five per cent level, 
we reject the null hypothesis and conclude that turtles swim faster in 42°C 
wafer than m 33°C water We might conclude from this that the cold-blooded 
turtle finds hot water noxious 

Note that T s= 4(10) - 28 » 12 and that the probability that T' is equal 
to or less than 12 is 0032 Further, note that T is the value obtained by 
summing the ranks of the observations arranged so that score 1 is assigned 
to the largest value, score 2 to the second largest, etc That is, 
7''sl+2-f34-6 = 12, sinceiheranksofthefourvalues432, 210. 183,85 
are 1, 2, 3. 6, respectively Thus the frequencies of only those values of 
T from the smallest to the mean of the T distribution need be computed 
and recorded In our example this includes T = 10. 1 1 ,20 

Due to the symmetry property the lower part of the T distribution may 
be used in either one sided or two-sided test procedures In order to apply 
a two sided test wc require a critical point, T» for which a/2 of the rank 
sums in the appropriate sampling distribution he below The samples are 
declared significantly differcnl at the a level if cither TorT lies below this 
critical point Tj In a one-sided test procedure To is that rank sum below 
which a of the rank sums he In the one-sided test use either the statistic 
T or the statistic T , whichever is appropriate 

From Table 16 2 wc find that 5 6 per cent of the T values are equal to 
or less than 13. and 3 2 per cent are equal to or less than 12 Since there is 
no value of T below which exactly five per cent of the rank sums he we take 
as the critical point T, that value below which five per cent or less of the rank 
Sums he. that is, m this case F# = 12 The five per cent critical value defined 
m this way is not usually true to its name However, the discrepancy is not 
usually considered serious The value F, = 12 may be used as a five per 
cent point in a one-sided test or a ten per cent point in a two-sided test 
For a one sided 2 5 per cent test or a two-sided five per cent test F, = 11, 
the true significance levels being 1 6 per cent and 3 2 per cent, respectively 
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Table 16.3 has been constructed in this manner for all meaningful values of 
tij and Wo for which w, -f- Wo < 30 and «, < Wo. 

In all the above discussion of the rank-sum test we have ignored the 
case of tied values of the observations. In case of ties, the usual procedure, 
although there are good alternatives, is to find the mean of ranks that the 
tied observations would have if they were distinguishable, and then to assign 
each this mean rank. When the tied observations fall in one sample, the test 
is not affected; when observations in different samples are tied, the test is 
not seriously affected. 

It is not difficult to show that the variance of the sampling distribution 
of T is 


Table 16.3 

Critical Values for Rank Sums* 

For a two-sided test they are 5 per cent (nr less) values. 
For a one-sided test they are 2.5 per cent (or less) values. 



* This table is reproduced from C. White, “The Use of Ranks in a Test of Significance 
for Comparing Two Treatments,” Biometrics, Vol. 8 (1950), Tables for .05 and .01, Critical 
Points of Rank Sums, pp. 37, 38, with permission of the editor of the journal. 
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Table 1<L3 

Cntical Values for Rank Sums (cent) 

For a (wo^ided test they are 1 per cent (or less) values 
For a one-sided test they are 0 5 per cent (or less) values 



and (hat (he distribution of T approaches the normal as n, and rij increase 
Therefore, approximate critical points. TV, may be found from 


T. «,(/!, n, 1) 

_ V 2 

' / «iW,(vr. + n, + }) 

V 12 


(16 14) 


with the use of the standard noimal For values of n, and n, outside the 
range of Table 16 3 the approximation ts very good when ct = 5 per cent, 
and, also, very good when a = \ per cent unless rtj is small In fact, when 
n, -I- n, = 30 and a = 5 per cent, the approximate values T, are the same 
as To of Table 16 3 m 11 of 14 cases, and only one unit higher m the remain- 
ing three 

If all the assumptions of the two-sample t test hold, the rank-sum test 
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is valid. It can be shown [13] that the power efficiency of the rank-sum test 
relative to the two-sample t test is near 0.95 when the true difference in 
location parameter is near zero. Thus, in order to provide the same power, 
approximately five per cent more observations are required for the rank-sum 
test than for the two-sample f test. However, for nonnormal populations 
the rank-sum test may be more powerful [13, 41] than the two-sample t test 
(in which case it should be noted that the two-sample t test is not valid and 
the rank-sum test is). 

There are possible advantages as well as disadvantages to using the 
rank-sum test and other ranking procedures. Some of the advantages are 

1. The calculations are simpler. The sample statistic is very easy to 
compute. Most of the work is in ranking the observations, and this 
can be made easier with shortcut techniques 

2. The assumptions are few and not very restrictive. The form of the 
distributions need not be known. Knowledge of the population mean 
and variance are not required 

3. The data may be in ordinal form only 

4. Real differences in location parameters may be more easily detected. 
This is particularly true when the populations deviate considerably 
from normality or the location parameters are not means 

Some of the disadvantages are 

1 . There may be a loss in efficiency. Some information may be sacrificed 
when the quantitative nature of the data is replaced by ranks 

2. It is not possible to establish confidence limits for the difference in 
location parameters of two populations. The hypothesis of equality 
of location parameters may be tested against either a one-sided or 
a two-sided alternative, but an interval estimate for this difference 
is not possible 

The rank-sum statistic was described for equal sample sizes by 
Wilcoxon [52, 54]. The relation of the rank-sum test to other ranking pro- 
cedures is discussed in Refs. [41, 44]. Mann and Whitney [29] proposed an 
alternative test procedure for unequal sample sizes. This test is described 
in the next section. 

76 . 3 . 2 . The U Test 

The test procedure described by Mann and Whitney [29] requires that 
the observations in the two independent samples of (16.6) be arranged in 
ascending order with the identity of the samples preserved. They define 
a statistic, U, to be equal to the number of times a y precedes an x. 

As an illustration, suppose the nine observations of Example 16.3 are 
arranged in increasing order with “times” associated with 33°C water 
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designated by y and those associated with 42'C water designated by x 
Thus, the order and identity of the sample values are as follows 


Order of values 

13 56 72 83 S9 14S 183 210 432 

Identity of sample 

jrxxyjr X y y y 


Since the y score 85 precedes each of the x scores 89 and 145, and since 
the y scores 183, 210, and 432 do not precede any scores, the sample value 
■of the V statistic is 


t/ = 2 + 0 + 0 + 0=s2 

The test, in its original form was designed to test the null hypothesis 

//« the V and > values have the same distribution (16 IS) 
against the alternative hypothesis 
//, the location parameter of y is larger than the 

location panmetcr of x that is the bulk of the (16 16) 
distribution of ^ s is to the right of the bulk of 
thedistribulioo of rs 

If //. IS true wc expect U o be small Mann and Whitney {29J computed 
tables which give probabilities associated with small (lower tail) values of 
V, and Auble [3] gives tables of critical values of U for significant levels 
ofOOOl 001. 0025, and 005 for a one sided test For the one sided alterna 
live hypothesis that the location panmeter ofy is smaller than the parameter 
of V we compute the scatisiK V defined to be the number of times an r 
precedes a y and use Auble s tables to lest //, For a two sided alternative 
hypothesis wc compute U or U whichever is smaller and use Auble s tables, 
understanding that the sigmhcance levels nre now 0 002, 0 02, 0 05, and 0 10 
We do not give tables of the U statistic since it can be shown that this 
statistic and the T statistic of Sect 16 3 1 give identical results In fact, it 
is not difficult to show that 

U n.n, + - T, (16 17) 

where T, denotes the sum of the ranks assigned to the samplex^f sue n„ 
or that 

V-=.7.,7>, -T, 

where 7j denotes the sum of the ranks assigned to the sample of size n, 
(It IS to be understood that the ranks of a sample are those obtained by 
arranging the n, + n, observations of the two independent samples according 
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to ascending order.) The U statistic is usually computed by Eq. (16.17) or 
Eq. (16.17a), since it is tedious to compute by using the definition of U when 
«, and «2 become fairly large. 

Under the null hypothesis (16.15) it can be shown that the mean and 
variance’ of the sampling distribution of U are given by 


and 


l^u — 




(16.18) 


oIt 


«,«5(Mi Ms -f 1) 
12 


(16.19) 


In fact, when both n, and Wj are larger than eight, the statistic U is ap- 
proximately normally distributed, and the larger the sample sizes, the 
better the approximation. 

Since the U statistic is equivalent to the T statistic of Sect. 16.3.1, the 
statements about the relative efficiency of rank-sum tests apply to the 
Mann-Whitney U test. Also, the rank-sum and Mann-Whitney test pro- 
cedures have the same advantages and disadvantages. For a discussion of 
tied scores and other topics the reader is referred to Refs. [16, 48, 49]. 


16.3.3. The Median Test 

Let the combined n, + n, observations in (16.6) be arranged according 
to increasing order of magnitude with the identity of the samples preserved. 
Let ri (r„) denote the ratio of the number of ^s to the number of .v’s to the 
left (right) of the median of the combined samples. Now, if the null hy- 
pothesis (16.15) is true, should not differ greatly from r,. However, if 
the median of the distribution of y’s is to the right of the median of the distri- 
bution of the .v’s, then r„ should be larger than r,. This suggests that a sta- 
tistic defined in terms of the number of x’s (y's) to the right of the combined 
sample median would be useful in testing the null hypothesis against either 
the one-sided or the two-sided alternative hypothesis that the population 
medians are different. 

Let m, (nij) denote the number of .v(y) values larger than the combined 
sample median. Observe that m, -i- is (/ij tii)/! when n, -t- /i, is even 
and is («, -t- n, — I)/2 when n, -f lu is odd. If the n, values are assumed 

to be distinct, it can be shown that the probability, P[w,], of exactly w, .v's 
(and, of course, exactly nu /s) exceeding the combined sample median is 
given by 



610 


DISTRIBUTION FREE METHODS 


CHAP l« 


The expression (16 20) is the hypergeomelric density function Thus to test 
the null hypothesis //, against the altcrnatne hypothesis that the median of 
the distribution of > values is to the nght of the median of the distribution 
of X values we need to know the probability that m is equal to or less than 
the sample value m « that is we require 

(/r,\ 

/■I- <:»,.] 06 21) 

Liberman and Owen [28] have computed tables of the hypergeometnc 
distribution which nay be used to find Eq (16 21) so as to test l/t against 
the alternative indicated These tables may also be used for a two sided 
alternative 

It IS informative to note that the two independent samples may be 
dichotomized and arranged as shown in Table 16 4 Thus when 


TaMe 164 

Grouped Data for Med an Test 


Sumbtr of SempU Observai ant 

SompI of j 

X values > values ' 

Total 

Greater than the comb ned med an 
Equal to or lesi than ihe comb ned 
med an 

» - m 

m + frii 

n + R - (m, + M ) 

Total 

n R] 

n +«, 


n + n, 20 Fisher s cxacitest may be used to test //„ whenn + n, ^ 20 
the x' statistic corrected for continuity may be applied Also when n and 
n, are quite large Theorem 6 7 and the methods of Sect 6 2 may be applied 
to give an approximate lest of //. In this case the statistic 




(16 22) 


approximates the standard normal statistic u 

The median test is not as powerful as the rank sum Mann Whitney 
Student tests However it has the advantage that it is easily generalized 
to k samples where the x’ ^cst is still appropriate Also the median test 
may be extended to apply to any percentile of the grouped data instead of 
the fiftieth percentile (median) Tor a discussion of these statements a proof 
of Eq (16 20) and information on other related topics the reader is referred 
to Refs [10 14 24 32 35 38 55] 
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16.4. DISTRIBUTION-FREE METHODS IN ANALYSIS OF VARIANCE 

In Sect. 16.2 and 16.3 we described a few procedures which are useful 
in comparing two populations. These test procedures have been generalized 
so that several distribution-free methods are available for comparing more 
than two populations. Kruskel and. Wallis [27] described a test based on 
ranks which may be used for the completely randomized design. Distri- 
bution-free methods for the randomized complete-block design have been 
described by Friedman [19], Mood [31], and Kendall and Smith [25]. Durbin 
[15] has generalized the method of Kendall and Smith to balanced incom- 
plete-block designs, and Bradley and Terry [6], Bradley [7, 8, 9] and Abelson 
and Bradley [1] have described methods of paired comparisons which are 
useful in incomplete-block designs! Descriptions of these and other methods 
may be found in Refs. [II, 17, 24, 31, 36, 38, 39]. 

16.5. RANDOMNESS 

Statistical inference is basea in some way on the assumption that sample 
values are randomly drawn from one or more populations. So long as the 
sample values are obtained with the aid of a random number table or 
recognized probability methods, the randomness assumption is not likely 
to be questioned. However, in situations where the experimenter (or in- 
vestigator) has little or no control over the selection of the data values, he 
might well question the randomness assumption. For example, a long-range 
prediction of death in highway accidents must be based on whatever records 
are available, so one may wish to test to see if the observations can be con- 
sidered to be random. .Also, one may wish to test the randomness assump- 
tion in an effort to detect the presence (or absence) of assignable causes 
in an investigation of statistical control. 

The effect of time on an experiment has not been taken into account in 
most of the techniques already described. Still, knowing the particular time, 
that is, the orc/er in time, in which observations are made might give valuable 
information on the randomness of the sample values. In fact, we describe 
a test procedure based upon order in time which is useful in testing the usual 
randomness assumption. (A careful study of the effect of “time” in an 
experiment might well require a second volume. Such topics as quality control 
and sampling inspection, time series analysis, sequential test procedures, 
and stochastic processes fall in this category. There are many articles (see 
the bibliography in Ref. [4]) and books [4, 12, 18, 21, 26, 43, 50, 51] written 
on these topics.) 

It is difficult to test for randomness. However, tests have been described 
which detect nonrandomness. This means that it is possible to conclude 
that a given sequence of observations is not random at a specified significance 
level, but that it is not possible to conclude that the sequence is random. 
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16 20 

. . . . Order 


2 4 6 8 to 12 14 16 

(d) Extreme variation 

16 20 


Fig 16 1 Doi Diagrams Indicaiing Order i 
Observations were Drawn 


Before describing 3 tew, we consider some lUustralions of nonrandom- 
ness shown in Fig 16 1 The honzonUl axis designates the order in which 
the observations are drawn, and the vertical axis designates the magnitude 
of the values Figure 16 la shows a linear trend m time, and Figs 16 lb 
and 16 Ic illustrate cyclic patterns There are other patterns of possible 
nonrandomness, but these are adequate for our purposes They certainly 
indicate that taking into account the order of occurrence of the observations 
might be useful in constructing tests of nonrandomness 

The reader should realize that the pattern sometimes depends on the 
nature of the population sampled For example, the pattern in Fig 16 Id 
might have resulted from randomly selecting the observations from a popu- 
lation with a long tail to the right 
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Some common tests of nonrandomness in a sequence of observations 
are based on number of runs, length of runs, runs up and down [30, 34], and 
control charts. Other tests are based on such things as serial correlation 
[2, 42] and mean square of successive differences [23, 33]. 

16.5.7. Use of Runs in Detecting Nonrandomness 

A sequence of i identical symbols which is preceded and followed by a 
different symbol or no symbol is called a run of length i. Thus, in the sequence 

aabaaabbbbbabba (16.23) 

consisting of seven a’s and eight 6’s, there are four runs of a’s and three runs 

of b's. The lengths of the runs of a’s are 2, 3, 1, and 1, respectively; the 
lengths of the runs of b's and 1, 5, and 2, respectively. The same 15 a’s and 
b's can be arranged in other ways. For example, the sequence 

bababababababab (16.24) 

has seven runs of a’s and eight runs of b's, each of length 1, and the sequence 

aaaaaaabbbbbbbb (16.25) 

has one run of a’s of length 7 and one run of b's of length 8. Thus, we observe 
that the seven a’s and eight b's can be arranged so that the total number of 
runs ranges from two to 15. 

In the particular case where a represents “heads” and b represents “tails” 
on a fair toss of a well-balanced coin, we might expect sequence (16.23) 
to occur, but we probably would not expect sequences (16.24) and (16.25) 
to occur, since they indicate short- and long-term cyclic patterns, respec- 
tively. That is, sequences (16.24) and (16.25)’ indicate nonrandomness, and 
sequence (16.23) could well indicate randomness. Thus, a sequence with too 
many or too few runs could indicate the absence of randomness. That is, 
the total number of runs, r, could be used to test for randomness. 

If there are many runs, each one can be expected to be short; if there 
are few runs, at least one can be expected to be long. Thus, the length of the 
longest run could be used to test randomness. 

Sequences such as (16.23), (16.24), and (16.25) may result from sampl- 
ing a dichotomous population, but such sequences can also be formed 
from continuous populations by letting the observed values above or below 
a given value Xo be indicated by a or b, respectively. For example, in a routine 
production, b could denote nondefective and a defective parts. Further, 
in a sample of college students b could denote an I.Q. score below 1 10 and 
a an I.Q. score above 1 10. The value Xo may be any percentile (fractile) value, 
but the median is frequently used. 
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Now, consider the sampling distribution of the total number of runs r, 
that 1 $, the sampling distribution of 


where r, and r, denote the number of runs of a's and b's, respectively Let 
fU and n, denote the number of o’s and b’s, respectively Thus, the total 
number of observations n is given by 


n s= R, + nj 

provided jr« is not an observation (In case one or more of the' observations 
have the value the usual practice is to omit them from the sample, and 
thus from the resulting computations and conclusions In some cases at, 
may be selected before the data is collected, so that none of the observed 
values will be ) Under the assumption that every arrangement of ui a's 
and n. b's is equally likely, it can be shown 15, 11. 22. 24, 30] by first finding 
the joint density function of and r, that the densilv function of r is given by 


m- 




(16 26) 


further, it is not difficult to show that the mean and variance of the sampling 
distribution are given by 


(16 27) 


and 


06 2S) 

n tn — IJ 

The density function (16 26) is useful in computing lower, r’, and upper, 
r' , critical values of r In fact, Swed and Eisenhart [40] give, for Rj ^ «i < 20, 
the critical values such that 


2/('') = b>(r^r}^a for a = 0 005.0 01,0 025 and 0 05 



Table 16.5 

Lower 0.025 Critical Values for Number of Runs* 

{The probability is 0.025 or less that r is equal to or less than the value tabled.) 



* This table is abridged from F. S. Swed and C. Eisenhart, “Tables for Testing Ran- 
domness of Grouping in a Sequence of Alternative," Amah of Mathematical Statistics. 
Vol. 14 (1943), Table II, with permission of the editor of the journal. 


Upper 0.025 Critical Values for Number of Runs* 

(The probability is 0.025 or less that r is equal to or greater than the value tabled.) 


«i 

5 

6 

7 

8 

9 

1C 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

4 

9 

9 
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10 

10 

11 

11 
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11 

12 

12 

13 

13 

13 

13 
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13 

13 

14 

14 

14 

14 

15 

15 

15 






8 




14 

14 

15 

15 

16 

16 

16 

16 

17 

17 

17 

17 

17 

9 





15 

16 

16 

16 

17 

17 

18 

18 

18 

18 

18 

18 

10 






16 

17 

17 

18 

18 

18 

19 

19 

19 

20 

20 

11 
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18 

19 

19 

19 

20 

20 

20 

21 

21 

12 








19 

19 

20 

20 

21 

21 

21 

22 

22 

13 
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20 

21 

21 

22 

22 

23 

23 

14 










21 

22 

22 

23 

23 

23 

24 

15 
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22 

23 

23 

24 

24 

25 

16 












23 

24 

25 

25 

25 

17 
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25 

26 

26 

18 














26 

26 

27 

19 















27 

27 

20 


- 














28 


• This table is abridged from F. S. Swed and C. Eisenhart, “Tables for Testing Ran- 
domness of Grouping in a Sequence of Alternatives,” Annals of Mathematical Statistics, 
Vol. 14 (1943), Table III, with permission of the editor of the journal. 
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and 

"2 = ^t'- ^ ^ a for a « 0 95. 0 975. 0 99 and 0 995 

Parts of their Tables II and III which can be used in a 3 5 per cent one sided 
test or a five per cent two sided test are reproduced id Table 16 5 For an 
illustration of a two-sided test consider the following example 

Example 16 4 On inspecting a row of 22 plants fdr "'ilt, the sequence 
of hardy, H, and droopy, D plants was as follows 

HHHHDHDDHHHHHHDDDMHHHH 
Use a two sided five per cent level test for the null hypothesis 
Nt the H s and D s occur in random order 
against the alternative hypothesis 

Ht the 7/ s and Os occur in nonrandoni order 

T’neteaTen, -t)Z>5 n,*^bHs and r » 1 runs 
to the number of plants in the smaller category ) Acchtding to Table 16 5, 
the lower 2 S per cent critical value of r is r«j = 5 Th’^^ 7 > 5, we 

fail to reject H, and conclude that the sample of 22 hardy and droopy plants 
may be arranged in random order 

If n, and n, are large, then r is approximately nornially distributed with 
mean fi, and variance as given by Eqs (16 27) ond (16 28) That is 

r - 4> 

ftr 

is approximately normally distributed with mean zero and variance one 
If both and n, exceed 20 the normal approxirpation 1 $ quite good 
When n, < 5 and Tj > iO the probabilities for an extreme number of runs 
should be computed directly by Eq (16 26) In other c?ses where Table 16 5 
cannot be applied tie normal approximation is probably satisfactory 

16 5.2 Ute of Control Chort* in Oefechnp NonrondDmnert 
The control chart in addition to its use in detecbng and eliminating 
assignable causes of variation can also be used to determine whether the 
sequence of observations can be considered a controlled or random sequence 
Even though the control chart may detect any type of nonrandomness 
It IS particularly useful in detecting extreme variation of individual measure 
ments For details involving the use of control charts the reader is referred 
to Refs 15, 20, 22, 37j 
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76.6. EXERCISES 

16.1. The claim was made that stimulus 6' increases blood pressure (systolic) 
by five units. Table 16.6 gives the blood pressure of 16 subjects imme- 
diately before and m minutes after the application of S. The null hypo- 


Table 16.6 


Order Number 
of Subject 

Before 

Stimulus 

After 

Stimulus 

Order Number 
of Subject 

Before 

Stimulus 

After 

Stimulus 

1 

113 

116 

9 

106 

115 

2 

111 

116 

10 

109 

115 

3 

124 

MM 

11 

109 

116 

4 

120 

MM 

12 

120 

123 

5 

115 

122 

13 

105 

104 

6 

122 

121 

14 

129 

130 

7 

102 

105 

15 

103 

110 

8 

122 

125 

1 

16 

115 

124 


thesis is that the stimulus increases blood pressure five units; the 
alternative hypothesis is that blood pressure is not increased five units. 
Test the null hypothesis at the five per cent level by applying (a) the sign 
test, (b) Wilcoxon’s sign rank procedure, and (c) the paired-/ procedure, 
(d) Compare the conclusions of the three tests. 

16.2. Use the data of Exercise 8.13 to test the hypothesis that the mean grade 
for method 1 is at least three units higher than the mean grade for 
method 2. Apply the sign, the paired-/, and Wilcoxon’s sign rank test 
procedures and compare results. 

16.3. Use the data of Exercise 8.14 and Wilcoxon’s sign rank procedure to 
test the hypothesis in Exercise 8.14(a). 

16.4. Use the data of Exercise 8.10 to test the hypothesis of no difference in 
tensile strength against the alternative hypothesis of unequal tensile 
strength. Apply the (a) rank-sum test, (b) 1/ test, (c) median test, and (d) 
/ test, (e) Compare the conclusions of the four test procedures. 

16.5. Along two stretches of highway, 55 miles per hour speed limit signs are 
regularly posted. THe following table gives the speeds of drivers who were 
caught exceeding the limit 


Highway I 

65, 82, 60, 59, 95, 78, 62, 63 

Highway 11 

66, 69, 61, 67, 68, 64, 72 


Determine whether there is any difference in violator mean speed on 
the two highways. Test using the (a) rank-sum test, (b) U test, and (c) 
median test, (d) Compare the conclusions of the three test procedures, 
and explain why the / test is not appropriate. 
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16 6 On a given day in two widely separated counties the maximum relativ 
humidity readings (in percentages), as recorded by 1 3 instruments of th 
same type, were as follows 


County A 

18. 42. 2S, 26, 20, 36 

County fi 

1 30. 31. 58. 12, 65, 33. 32 


Use two test proctdurts to compare the meap relative humidity of tb 
two counties Discuss the pros and cons of your conclusions 
lt>.7. The lengths of life minus ISOO kilowatt hours of 18 of the same typ 
of ctecironic tube made by two manufacturers were as follows 

Manufaetuicr A" I 38. J32, 0 J59, St, 118, 6. 78, 17), 137 
Man ifseturer y I 125. 78. 79. 210. 405,79 811, 6t 

Compare the mean lives of the electronic tubes made by X and Y 

168 Prove Eq (16 10) 

169 Prove Eq (16 13) 

16 10. Prove Eq (16 17) 

1611 Prove Eqs (16 18) and(l6 19) 

1612, Prove Eq (16 20) 

16 13. (a) DcRne a statistic which may be used to compare the first quartile 
of two distributions (b) Give an illustration of the use of your statistic 
(e) Find the mean and variance of the sampling distribution of you: 
statistic 

16 14 On inspecting 2$ consecutive units on a production line, it was foum 
that the sequence of defective D and nondefeciive N units was as 

NNNNNNNfJDNNNNNODDNNNNNNNC 
Determine whether the order of the sample could be considered random 
16.15. Thirty five people waiting in line to buy tickets were arranged a 
follows (Af denotes male and F female) 

A/MMFAfMMFAfFAfFAfFAtFFfMMFMFA/FMFXfFMAmMFAt 
Determine whether the order of the sample could be considered random 
16 16 Eighteen cars of logs to be used as puipwood are waiting on a track 
If more than ten per cent of a load is hardwood, the car is denoted b^ 
H, otherwise, the car is denoted by S (softwood) Determine whethei 
the order of the following sample can be considered random 
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16.17. Show that Eq. (16:26) holds. 

16.18. Prove Eq. (16.27). 

16.19. Prove Eq. (16.28). 
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TkSde I 

Ordinates of the Nonna] Density Function 
T| 00 01 02 M 04 05 06 07 08 o7 


0 3989 3989 3989 3988 3986 3984 3982 3980 3977 3973 

1 3970 3965 3961 3956 3951 3945 3939 3932 3925 3918 

2 3910 3902 3894 3885 3876 3867 3857 3847 3836 3825 

3 3814 3802 3790 3778 3765 3752 3739 3725 3712 3697 

4 3683 3668 3653 3637 3621 3605 3589 3572 3555 3538 

5 3521 3503 3485 3467 3448 3429 3410 3391 3372 3352 

6 3332 3312 3292 3271 3251 3230 3209 JI87 3166 3144 

7 3123 3101 3079 3056 3034 3011 2989 2966 2943 2920 

8 2897 2874 2850 2827 2803 2780 2756 2732 2709 2685 

9 2661 2637 2613 2589 2565 254« 2516 2492 2468 2444 

10 2420 2396 2371 2347 2323 2299 2275 2251 2227 2203 

1 1 2179 2155 2131 2107 2083 2059 2036 ^012 1989 1965 

12 1942 1919 IS9S 1872 1849 1826 1804 1781 1758 2736 

1 3 1714 1691 1669 1647 1626 1601 1582 1561 1539 1518 

1 4 1497 1476 1456 1435 1415 1394 1374 1354 1334 1315 

1 5 1295 1276 1257 1238 1219 1200 1182 1163 1145 1127 

1 6 1109 1092 1074 1057 1040 1023 1006 0989 0973 0957 

1 7 0940 0925 0909 0893 0878 0863 0848 0833 0818 0804 

1 8 0790 0775 0761 0748 0734 0721 0707 0694 0681 0669 

1 9 0656 0644 0632 0620 0608 0596 0584 0573 0562 0551 

20 0540 0529 05i9 0508 0498 0188 0478 0468 0459 0449 

2 1 0440 0431 0422 0413 0404 0396 0367 0379 0371 0363 

2 2 0355 0347 0339 0332 0325 0317 0310 0303 0297 0290 

2 3 0283 0277 0270 0264 0258 0252 0246 0241 0235 0229 

2 4 0224 0219 0213 0208 0203 0198 0194 0189 0184 0180 

2 5 0175 0171 0167 0163 0158 0154 0151 0147 0143 0139 

2 6 0136 0132 0129 0126 0122 0119 0116 0113 OMO 0107 

2 7 0104 0101 0099 0096 0093 0091 0088 0086 0064 0081 

2 8 0079 0077 0075 0073 0071 0069 0067 0065 0063 0061 

2 9 0060 0058 0056 0055 0053 0051 0050 0048 0047 0046 

3 0 0044 0043 0042 0040 0039 0038 0037 0036 0035 0034 

3 1 0033 0032 0031 0030 0029 0028 0027 0026 0025 0025 

3 2 0024 0023 0022 0022 0021 COM 0020 0019 0018 0018 

3 3 0017 0017 0016 0016 OOlS 0015 0014 0014 0013 0013 

3 4 0012 0012 0012 0011 0011 0010 0010 0010 0009 0009 

3 5 0009 0008 0008 0008 tXKW 0007 0007 0007 0007 0006 

3 6 0006 0006 0006 0005 0005 0005 0005 0005 0005 0004 

3 7 0004 0004 0004 OOCM 0004 OOW 0003 0003 0003 0003 

3 8 0003 0003 0003 0003 0003 0002 0002 0002 0002 0002 

3 9 0002 0002 0002 0002 0002 0002 0002 0002 0001 0001 
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Table H 

Cumulative Normal Distribution 


t 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

08 

.09 

.0 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

.5279 

.5319 

.5359 

.1 

.5398 

.5438 

.5478 

.5517 

.5557 

.5596 

.5636 

.5675 

.5714 

.5753 

.2 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

.3 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

.6406 

.6443 

.6480 

.6517 

.4 

, .6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

.5 

.6915 

.6950 

.6985 

.7019 

.7054 

.7088 

.7123 

.7157 

.7190 

,7224 

.6 

.7257 

.7291 

.7324 

.7357 

.7389 

.7422 

.7454 

.7486 

.7517 

.7549 

.7 

.7580 

.7611 

.7642 

.7673 

.7704 

.7734 

.7764 

.7794 

.7823 

.7852 

.8 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

' .8106 

.8133 

’ .9 

.8159 

.8186 

.8212 

.8238 

.8264 

.8289 

.8315 

.8340 

.8365 

.8389 

1.0 

.8413 

.8438 

.8461 

.8485 

.8508 

.8531 

.8554 

.8577 

.8599 

.8621 

1.1 

.8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

ig 

.8849 

.8869 

.8888 

.8907 

.8925 

.8944 

.8962 

.8980 

.8997 

.9015 

n 

.9032 

.9049 

.9066 

.9082 

.9099 

.9115 

.9131 

• .9147 

.9162 

.9177 


.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.5 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

.9406 

.9418 

.9429 

.9441 

1.6 

.9452 

.9463 

.9474 

.9484 

.9495 

.9505 

.9515 

.9525 

.9535 

.9545 

1.7 

.9554 

.9564 

.9573 

.9582 

.9591 

.9599 

.9608 

.9616 

.9625 

.9633 

1.8 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.9 

.9713 

.9719 

.9726 

.9732 

.9738 

.9744 

.9750 

.9756 

.9761 

.9767 

2.0 

.9772 

.9778 

.9783 

.9788 

.9793 

.9798 

.9803 

.9808 ‘ 

.9812 

.9817 

2.1 

.9821 

.9826 

.9830 

.9834 

.9838 

.9842 

.9846 

.9850 

.9854 

.9857 

2.2 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.3 

.9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

.9911 

.9913 

.9916 

2.4 

.9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.5 

.9938 

.9940 

.9941 

.9943 

.9945 

.9946 

.9948 

.9949 

.9951 

.9952 

2.6 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.7 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

.9973 

.9974 

2.8 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

.9979 

.9980 

.9981 

2.9 

.9981 

.9982 

.9982 

.9983 

.9984 

.9984 

.9985 

.9985 

.9986 

.9986 

3.0 

.9987 

.9987 

.9987 

.9988 

.9988 

.9989 

.9989 

.9989 

.9990 

.9990 

3.1 

.9990 

.9991 

.9991 

.9991 

.9992 

.9992 

.9992 

.9992 

.9993 

.9993 

3.2 

.9993 

.9993 

.9994 

.9994 

.9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.3 

.9995 

.9995 

.9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

.9997 

3.4 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9998 


1 

1.282 

1.645 

1.960 2.326 2.576 

3.090 3.291 

3.891 

4.417 

m) 

.90 

.95. 

.975 .99 

.995 

.999 .9995 

.99995 

.999995 

211 - N(t)] 

.20 

.10 

.05 .02 

.01 

.002 .001 

.0001 

.00001 
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Confidence Belts for Proportions* {cont.) 
(Confidence coefficient .99) 
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• This chart is reproduced from C. J. Clopper and E. S. Pearson, “The Use of Confi- 
dence or Fiducial Limits Illustrated in the Case of the Binomial,” Biometrika, Vol. 26 
(1934), p. 404, with the permission of Professor E. S. Pearson. 
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Table IV 


Percentage Points of the x' Distribution* 



ceniage Points of the x* Disiribuiion ~ Biomelrika Vol 32 (1941) pp 
1S8 89 uith the permnsion of Professor E S Pearson 
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Table IV 


Percentage Points of the x® Distribution {cant.) 


.500 

.250 

.100 

.050. 

.025 

.010 

.005 

.455 

1.32 

2.71 

3.84 

5.02 

6.63 

7.88 

1.39 

2.77 

4.61 

5.99 

7.38 

9.21 

10.6 

2.37 

4.11 

6.25 

7.81 

9.35 

11.3 

12.8 

3.36 

5.39 

7.78 

9.49 

11.1 

13.3 

14 9 

4.35 

6.63 

9.24 

11.1 

12.8 

15.1 

16.7 

5.35 

7.84 

10.6 

12.6 

14.4 

16.8 

18.5 

6.35 

9.04 

12.0 

14.1 

16.0 

18.5 

20.3 

7.34 

10.2 

13.4 

15.5 

17.5 

20.1 

22.0 

8.34 

11.4 

14.7 

16.9 

19.0 

21.7 

23.6 

9.34 

12.5 

16.0 

18.3 

20.5 

23.2 

25.2 

10.3 

13.7 

17.3 

19.7 

21.9 

24.7 

26.8 

11.3 

14.8 

18.5 

21.0 

23.3 

26.2 

28.3 

12.3 

16.0 

19.8 

22.4 

24.7 

27.7 

29.8 

13.3 

17.1 

21.1 

23.7 

26.1 

29.1 

31.3 

14.3 

18.2 

22.3 

25.0 

27.5 

30.6 

32.8 

15.3 

19.4 

23.5 

26.3 

28.8 

32.0 

34.3 

16.3 

20.5 

24.8 

27.6 

30.2 

33.4 

35.7 

17.3 

21.6 

26.0 

28.9 

31.5 

34.8 

37.2 

18.3 

22.7 

27.2 

30.1 

32.9 

36.2 

38.6 

’ 19.3 

23.8 

28.4 

31.4 

34.2 

37.6 

40.0 

20.3 

24.9 

29.6 

32.7 

35.5 

38.9 

41.4 

21.3 

26.0 

30.8 

33.9 

36.8 

40.3 

42.8 

22.3 

27.1 

32.0 

35.2 

38.1 

41.6 

44.2 

23.3 

28.2 

33.2 

36.4 

39.4 

43.0 

45.6 

24.3 

29.3 

34.4 

37.7 

40.6 

44.3 

46.9 

25.3 

30.4 

35.6 

38.9 

41.9 

45.6 

48.3 

26.3 

31.5 

36.7 

40.1 

43.2 

47.0 

49.6 

27.3 

32.6 

37.9 

41.3 

44.5 

48.3 

51.0 

28.3 

33.7 

39.1 

42.6 

45.7 

49.6 

52.3 

29.3 

34.8 

40.3 

43.8 

47.0 

50.9 

53.7 

39.3 

45.6 

51.8 

55.8 

59.3 

63.7 

66.8 

49.3 

56.3 

63.2 

67.5 

71.4 

76.2 

79.5 

59.3 

67.0 

74.4 

79.1 

83.3 

88.4 

92.0 
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Table VI 


Percentage Points of the t pistribution* 

a 

V 

.25 

1 


.15 

.10 

m 





1 

1.000 

1.376 

1.963 

3.078 

6.314 

12.706 

31.821 

63.657 

636.619 

2 

.816 

1.061 

1.386 

1.886 

2.920 

4.303 

6.965 

9.925 

31.598 

3 

.765 

.978 

1.250 

1.638 

2.353 

3.182 

4.541 

5.841 

12.941 

4 

.741 

.941 

1.190 

1.533 

2.132 

2.776 

3.747 

4.604 

8.610 

5 

.727 

.920 

1.156 

1.476 

2.015 

2.571 

3.365 

4.032 

6.859 

6 

.718 

.906 

1.134 

1.440 

1.943 

2.447 

3 143 

3.707 

5.959 

7 

.711 

.896 

1.119 

1.415 

1.895 

2.365 

2.998 

3.499 

5.405 

8 

.706 

.889 

1.108 

1.397 

1.860 

2.306 

2.896 

3.355 

5.041 

9 

.703 

.883 

I.lOO 

1.383 

1.833 

2.262 

2.821 

3.250 

4.781 

10 

.700 

.879 

1.093 

1.372 

1.812 

2.228 

2.764 

3.169 

4.587 

11 

.697 

.876 

1.088 

1.363 

1.796 

2.201 

2.718 

3.106 

4.437 

12 

.695 

.873 

1.083 

1.356 

1.782 

2.179 

2.681 

3.055 

4.318 

13 

.694 

.870 

1.079 

1.350 

1.771 

2.160 

2.650 

3.012 

4.221 

14 

.692 

.868 

1.076 

1.345 

1.761 

2.145 

2.624 

2.977 

4.140 

15 

.691 

.866 

1.074 

1.341 

1.753 

2.131 

2.602 

2.947 

4.073 

16 

.690 

.865 

1.071 

1.337 

1.746 

2.120 

2.583 

2.921 

' 4.015 

17 

.689 

.863 

1.069 

1.333 

1.740 

2.110 

2.567 

2.898 

3.965 

18 

.688 

.862 

1.067 

1.330 

1.734 

2.101 

2.552 

2.878 

3.922 

19 

.688 

.861 

1.066 

1.328 

1.729 

2.093 

2.539 

2.861 

3.883 

20 ' 

.687 

.860 

1.064 

1.325 

1.725 

2.086 

2.528 

2.845 

3.850 

21 

.686 

.859 

1.063 

1.323 

1.721 

2.080 

2.518 

2.831 

3.819 

22 

.686 

.858 

1.061 

1.321 

1.717 

2.074 

2.508 

2.819 

'3.792 

23 

.685 

.858 

1.060 

1.319 

1.714 

2.069 

2.500 

2.807 

3.767 

24 

.685 

.857 

1.059 

1.318 

1.711 

2.064 

2.492 

2.797 

3.745 

25 

.684 

.856 

1.058 

1.316 

1.708 

2.060 

2.485 

2.787 

3.725 

26 

.684 

.856 

1.058 

1.315 

1.706 

2.056 

2.479 

2.779 

3.707 

27 

.684 

.855 

1.057 

1.314 

1.703 

2.052 

2.473 

2.771 

3.690 

28 

.683 

.855 

1.056 

1.313 

1.701 

2.048 

2.467 

2.763 

3.674 

29 

.683 

.854 

1.055 

1.311 

1.699 

2.045 

2.462 

2.756 

3.659 

30 

.683 

.854 

1.055 

1.310 

1.697 

2.042 

2.457 

2.750 

3.646 

40 

.681 

.851 

1.050 

1.303 

1.684 

2.021 

2.423 

2.704 

3.551 

60 

.679 

.848 

1.046 

1.296 

1.671 

2.000 

2.390 

2.660 

3.460 

120 

.677 

.845 

1.041 

1.289 

1.658 

1.980 

2.358 

2.617 

3.373 

oo 

.674 

.842 

1.036 

1.282 

1.645 

1.960 

2.326 

2.576 

3.291 


* This table is abridged from Table III of R. A. Fisher and Frank Yates, Statistical 
Tables for Biological, Agricultural and Medical Research, 5th ed., published by Oliver 
and Boyd, Ltd., Edinburgh. By permission of the authors and publishers. 
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Percentage Points of the F Distribution* 



• This table IS abridged from Masine Memngton and Catherine Thompson, "Table 
of Percentage Points m the Invened Beta Distnbutmn,'* Biometriko Vol 33 (1943), pp 
73-5?, with permission of Professor E S Pearson 
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Table VH 

Percentage Points of the F Distribution (cont.) 
a = 0.50 


10 

12 

15 

20 

24 

30 

40 

60 

120 

oo 

2.04 

2.07 

2.09 

2.12 

2.13 

2.15 

2.16 

2.17 

2.18 

2.20 

1.35 

1.36 

1.38 

1.39 

1.40 

1.41 

1.42 

1.43 

1.43 

1.44 

1.18 

1.20 

1.21 

1.23 

1.23 

1.24 

1.25 

1.25 

1.26 

1.27 

1.11 

1.13 

1.14 

1.15 

1.16 

1.16 

. 1.17 

1.18 

1.18 

1.19 

1.07 

1.09 

1.10 

1.11 

1.12 

1.12 

1.13 

1.14 

1.14 

1.15 

1.05 

1.06 

1.07 

1.08 

1.09 

1.10 

1.10 

1.11 

1.12 

1.12 

1.03 

1.04 

1.05 

1.06 

1.07 

1.08 

1.08 

1.09 

1.10 

1.10 

1.02 

1.03 

1.04 

1.05 

1.06 

1.07 

1.07 

1.08 

1.08 

1.09 

1.01 

1.02 

1.03 

1.04 

1.05 

1.05 

1.06 

1.07 

1.07 

1.08 

1.00 

1.01 

1.02 

1.03 

1.04 

1.05 

1.05 

1.06 

1.06 

1.07 

.994 

1.01 

1.02 

1.03 

1.03 

1.04 

1.05 

1.05 

1.06 

1.06 

.989 

1.00 

1.01 

1.02 

1.03 

1.03 

1.04 

1.05 

1.05 

1.06 

.984 

.996 

1.01 

1.02 

1.02 

1.03 

1.04 

1.04 

1.05 

1.05 

.981 

.992 

1.00 

1.01 

1.02 

1.02 

1.03 

1.04 

1.04 

1.05 

.977 

.989 

1.00 

1.01 

1.02 

1.02 

1.03 

1.03 

1.04 

1.05 

.975 

.986 

.997 

1.01 

1.01 

1.02 

1.03 

1.03 

1.04 

1.04 

.972 

.983 

.995 

1.01 

1.01 

1.02 

1.02 

1.03 

1.03 

1.04 

.970 

.981 

.992 

1.00 

1.01 

1.02 

1.02 

1.03 

1.03 

1.04 

.968 

.979 

.990 

1.00 

1.01 

1.01 

1.02 

1.02 

1.03 

1.04 

.966 

.977 

.989 

1.00 

1.01 

1.01 

1.02 

1.02 

1.03 

1.03 

.965 

.976 

.987 

.998 

1.00 

1.01 

1.02 

1.02 

1.03 

1.03 

.963 

.974 

.986 

.997 

1.00 

1.01 

1.01 

1.02 

1.03 

1.03 

.962 

.973 

.984 

.996 

1.00 

1.01 

1.01 

1.02 

1.02 

1.03 

.961 

.972 

,983 

.994 

1.00 

1.01 

1.01 

1.02 

1.02 

1.03 

.960 

.971 

.982 

.993 

1.00 

1.00 

1.01 

1.02 

1.02 

1.03 

.959 

.970 

.981 

.992 

1.00 

1.00 

1.01 

1.01 

1.02 

1.03 

.958 

.969 

.980 

.991 

1.00 

1.00 

1.01 

1.01 

1.02 

1.03 

.957 

.968 

.979 

.990 

1.00 

1.00 

1.01 

1.01 

1.02 

1.02 

.956 

.967 

.978 

.990 

1.00 

1.00 

1.01 

1.01 

1.02 

1.02 

.955 

.966 

.978 

.989 

.994 

1.00 

1.01 

1.01 

1.02 

1.02 

.950 

.961 

.972 

.983 

.989 

.994 

1.00 

1.01 

1.01 

1.02 

.945 

.956 

.967 

.978 

.983 

.989 

.994 

1.00 

1.01 

1.01 

.939 

.950 

.961 

.972 

.978 

.983 

.989 

.994 

1.00 

1.01 

.934 

.945 

.956 

.967 

.972 

.978 

.983 

.989 

.994 

1.00 
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Table Vn 

Percentage Points of the P Distribution {coni.) 


a = 0.25 


10 

12 

15 

20 

24 

30 

40 

60 

120 

OO 

9.32 

9.41 

9.49 

9.5 S 

9.63 

9.67 

9.71 

9.76 

9.80 

9.85 

3.38 

3.39 

3.41 

3.43 

3.43 

3.44 

3.45 

3.46 

3.47 

3.48 

2.44 

2.45 

2.46 

2.46 

2.46 

2.47 

2.47 

2.47 

2.47 

2.47 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

2.08 

1.89 

1.89 

1.89 

1.88 

1.88 

1.88 

1.88 

1.87 

1.87 

1.87 

1.77 

1.77 

1.76 

1.76 

1.75 

1.75 

1.75 

1.74 

1.74 

1.74 

1.69 

1.68 

1.68 

1.67 

1.67 

1.66 

1.66 

1.65 

1.65 

1.65 

1.63 

1.62 

1.62 

1.61 

1.60 

1.60 

1.59 

1.59 

1.58 

1.58 

1.59 

1.58 

1.57 

1.56 

1.56 

1.55 

1.54 

1.54 

1.53 

1.53 

1.55 

1.54 

1.53 

1.52 

1.52 

1.51 

1.51 

1.50 

1.49 

1.48 

1.52 

1.51 

1.50 

1.49 

1.49 

1.48 

1.47 

1.47 

1.46 

1.45 

1.50 

1.49 

1.48 

1.47 

1.46 

1.45 

1.45 

1.44 

1.43 

1.42 

1.48 

1.47 

1.46 

1.45 

1.44 

1.43 

1.42 

1.42 

1.41 

1.40 

1.46 

1.45 

1.44 

1.43 

1.42 

1.41 

1.41 

1.40 

1.39 

1.38 

1.45 

1.44 

1.43 

1.41 

1.41 

1.40 

1.39 

1.38 

1.37 

1.36 

1.44 

1.43 

1.41 

1.40 

1.39 

1.38 

1.37 

1.36 

1.35 

1.34 

1.43 

1.41 

1.40 

1.39 

1.38 

1.37 

1.36 

1.35 

1.34 

1.33 

1.42 

1.40 

1.39 

1.38 

1.37 

1.36 

1.35 

1.34 

1.33 

1.32 

1.41 

1.40 

1.38 

1.37 

1.36 

1.35 

1.34 

1.33 

1.32 

1.30 

1.40 

1.39 

1.37 

1.36 

1.35 

1.34 

1.33 

1.32 

1.31 

1.29 

1.39 

1.38 

1.37 

1.35 

1.34 

1.33 

1.32 

1.31 

1.30 

1.28 

1.39 

1.37 

1.36 

1.34 

1.33 

1.32 

1.31 

1.30 

1.29 

1.28 

1.38 

1.37 

1.35 

1.34 

1.33 

1.32 

1.31 

1.30 

1.28 

1.27 

1.38 

1.36 

1.35 

1.33 

1.32 

1.31 

1.30 

1.29 

1.28 

1.26 

1.37 

1.36 

1.34 

1.33 

1.32 

1.31 

1.29 

1.28 

1.27 

1.25 

1.37 

1.35 

1.34 

1.32 

1.31 

1.30 

1.29 

1.28 

1.26 

1.25 

1.36 

1.35 

1.33 

1.32 

1.31 

1.30 

1.28 

1.27 

1.26 

1.24 

1.36 

1.34 

1.33 

1.31 

1.30 

1.29 

1.28 

1.27 

1.25 

1.24 

1.35 

1.34 

1.32 

1.31 

1.30 

1.29 

1.27 

1.26 

1.25 

1.23 

1.35 

1.34 

1.32 

1.30 

1.29 

1.28 

1.27 

1.26 

1.24 

1.23 

1.33 

1.31 

1.30 

1.28 

1.26 

1.25 

1.24 

1.22 

1.21 

1.19 

1.30 

1.29 

1.27 

1.25 

1.24 

1.22 

1.21 

1.19 

1.17 

1.15 

1.28 

1.26 

1.24 

1.22 

1.21 

1.19 

1.18 

1.16 

1.13 

1.10 

1.25 

1.24 

1.22 

1.19 

1.18 

1.16 

1.14 

1.12 

1.08 

1.00 
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Table VH 

Percentage Points of the F Distribution ( cont .) 


a = 0.10 


10 

12 

15 

20 

24 

30 

40 

60 

120 

OO 

60.19 

60.71 

61.22 

61.74 

62.00 

62.26 

62.53 

62.79 

63.06 

63.33 

9.39 

9.41 

9.42 

9.44 

9.45 

9.46 

9.47 

9.47 

9.48 

9.49 

5.23 

5.22 

5.20 

5.18 

5.18 

5.17 

5.16 

5.15 

5.14 

5.13 

3.92 

3.90 

3.87 

3.84 

3.83 

3.82 

3.80 

3.79 

3.78 

3.76 

3.30 

3.27 

3.24 

3.21 

3.19 

3.17 

3.16 

3.14 

3.12 

3.10 

2.94 

2.90 

2.87 

2.84 

2.82 

2.80 

2.78 

2.76 

2.74 

2.72 

2.70 

2.67 

2.63 

2.59 

2.58 

2.56 

2.54 

2.51 

2.49 

2.47 

2.54 

2.50 

2.46 

2.42 

2.40 

2.38 

2.36 

2.34 

2.32 

2.29 

2.42 

2.38 

2.34 

2.30 

2.28 

2.25 

2 23 

2.21 

2.18 

2.16 

2.32 

2.28 

2.24 

2.20 

2.18 

2.16 

2.13 

2.11 

2.08 

2.06 

2.25 

2.21 

2.17 

2.12 

2.10 

2.08 

2.05 

2.03 

2.00 

1.97 

2.19 

2.15 

2.10 

2.06 

2.04 

2.01 

1.99 

1.96 

1.93 

1.90 

2.14 

2.10 

2.05 

2.01 

1.98 

1.96 

1.93 

1.90 

1.88 

1.85 

2.10 

2.05 

2.01 

1.96 

1.94 

1.91 

1.89 

1.86 

1.83 

1.80 

2.06 

2.02 

1.97 

1.92 

1.90 

1.87 

1.85 

1.82 

1.79 

1.76 

2.03 

1.99 

1.94 

1.89 

1.87 

1.84 

1.81 

1.78 

1.75 

1.72 

2.00 

1.96 

1.91 

1.86 

1.84 

1.81 

1.78 

1.75 

1.72 

1.69 

1.98 

1.93 

1.89 

1.84 

1.81 

1.78 

1.75 

1.72 

1.69 

1.66 

1.96 

1.91 

1.86 

1.81 

1.79 

1.76 

1.73 

1.70 

1.67 

1.63 

1.94 

1.89 

1.84 

1.79 

1.77 

1.74 

1.71 

1.68 

1.64 

1.61 

1.92 

1.87 

1.83 

1.78 

1.75 

1.72 

1.69 

1.66 

1.62 

1.59 

1.90 

1.86 

1.81 

1.76 

1.73 

1.70 

1.67 

1.64 

1.60 

1.57 

1.89 

1.84 

1.80 

1,74 

1.72 

1.69 

1.66 

1.62 

1.59 

1.55 

1.88 

1.83 

1.78 

1.73 

1.70 

1.67 

1.64 

1.61 

1.57 

1.53 

1.87 

1.82 

1.77 

1.72 

1.69 

1.66 

1.63 

1.59 

1.56 

1.52 

1.86 

1.81 

1.76 

1.71 

1.68 

1.65 

1.61 

1.58 

1.54 

1.50 

1.85 

1.80 

1.75 

1.70 

1.67 

1.64 

1.60 

1.57 

1.53 

1.49 

1.84 

1.79 

1.74 

1.69 

1.66 

1.63 

1.59 

1.56 

1.52 

1.48 

1.83 

1.78 

1.73 

1.68 

1.65 

1.62 

1.58 

1.55 

1.51 

1.47 

1.82 

1.77 

1.72 

1.67 

1.64 

1.61 

1.57 

1.54 

1.50 

1.46 

1.76 

1.71 

1.66 

1.61 

1.57 

1.54 

1.51 

1.47 

1.42 

1.38 

1.71 

1.66 

1.60 

1.54 

1.51 

1.48 

1.44 

1.40 

1.35 

1.29 

1.65 

1.60 

1.55 

1.48 

1.45 

1.41 

1.37 

1.32 

1.26 

1.19 

1.60 

1.55 

1.49 

1.42 

1.38 

1.34 

1.30 

1.24 

J.17 

1.00 
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Table VH 


Percentage Points of the P' Distribution (cont.) 
a = 0.05 


10 

12 

15 

20 

24 

30 

40 

60 

120 

OO 

242 

244 

246 

248 

249 

250 

251 

252 

253 

254 

19.4 

19.4 

19.4 

19.4 

19.5 

19.5 

19.5 

19.5 

19.5 

19.5 

8.79 

8.74 

8.70 

8.66 

8.64 

8.62 

8.59 

8.57 

8.55 

8.53 

5.96 

5.91 

5.86 

5.80 

5.77 

5.75 

5.72 

5.69 

5.66 

5.63 

4.74 

4.68 

4.62 

4.56 

4.53 

4.50 

4.46 

4.43 

4.40 

4.36 

4.06 

4.00 

3.94 

3.87 

3.84 

3.81 

3.77 

3.74 

3.70 

3.67 

3.64 

3.57 

3.51 

3.44 

3.41 

3.38 

3.34 

3.30 

3.27 

3.23 

3.35 

3.28 

3.22 

3.15 

3.12 

3.08 

3.04 

3.00 

2.97 

2.93 

3.14 

3.07 

3.01 

2.94 

2.90 

2.86 

2.83 

2.79 

2.75 

2.71 

2.98 

2.91 

2.84 

2.77 

2.74 

2.70 

2.66 

2.62 

2.58 

2.54 

2.85 

2.79 

2.72 

2.65 

2.61 

2.57 

2.53 

2.49 

2.45 

2.40 

2.75 

2.69 

2.62 

2.54 

2.51 

2.47 

2.43 

2.38 

2.34 

2.30 

2.67 

2.60 

2.53 

2.46 

2.42 

2.38 

2.34 

2.30 

2.25 

2.21 

2.60 

2.53 

2.46 

2.39 

2.35 

2.31 

2.27 

2.22 

2.18 

2.13 

2.54 

2.48 

2.40 

2.33 

2.29 

2.25 

2.20 

2.16 

2.11 

2.07 

2.49 

2.42 

2.35 

2.28 

2.24 

2.19 

2.15 

2.11 

2.06 

2.01 

2.45 

2.38 

2.31 

2.23 

2.19 

2.15 

2.10 

2.06 

2.01 

1.96 

2.41 

2.34 

2.27 

2.19 

2.15 

2.11 

2.06 

2.02 

1.97 

1.92 

2.38 

2.31 

2.23 

2.16 

2.11 

2.07 

2.03 

1.98 

1.93 

1.88 

2.35 

2.28 

2.20 

2.12 

2.08 

2.04 

1.99 

1.95 

1.90 

1.84 

2.32 

2.25 

2.18 

2.10 

2.05 

2.01 

1.96 

1.92 

1.87 

1.81 

2.30 

2.23 

2.15 

2.07 

2.03 

1.98 

1.94 

1.89 

1.84 

1.78 

2.27 

2.20 

2.13 

2.05 

2.00 

1.96 

1.91 

1.86 

1.81 

1.76 

2.25 

2.18 

2.11 

2.03 

1.98 

1.94 

1.89 

1.84 

1.79 

1.73 

2.24 

2.16 

2.09 

2.01 

1.96 

1.92 

1.87 

1.82 

1.77 

1.71 

2.22 

2.15 

2.07 

1.99 

1.95 

1.90 

1.85 

1.80 

1.75 

1.69 

2.20 

2.13 

2.06 

1.97 

1.93 

1.88 

1.84 

1.79 

1.73 

1.67 

2.19 

2.12 

2.04 

1.96 

1.91 

1.87 

1.82 

1.77 

1.71 

1.65 

2.18 

2.10 

2.03 

1.94 

1.90 

1.85 

1.81 

1.75 

1.70 

1.64 

2.16 

2.09 

2.01 

1.93 

1.89 

1.84 

1.79 

1.74 

1.68 

1.62 

2.08 

2.00 

1.92 

1.84 

1.79 

1.74 

1.69 

1.64 

1.58 

1.51 

1.99 

1.92 

1.84 

1.75 

1.70 

1.65 

1.59 

1.53 

1.47 

1.39 

1.91 

1.83 

1.75 

1.66 

1.61 

1.55 

1.50 

1.43 

1.35 

1.25 

1.83 

1.75 

1.67 

1.57 

1.52 

1.46 

1.39 

1.32 

1.22 

1.00 



640 


STATISTICAL TABLES 


Ttble Vn 

Pucemage Points of the FDistnbution 
a»0 025 




STATISTICAL TABLES 


641 


Table VH 

Percentage Points of the F Distribution (cn/rA) 
a = 0.025 


10 

12 

15 

20 

24 

30 

40 

60 

120 

CO 

969 

977 

985 

993 

997 

1001 

1006 

1010 

1014 

1018 

39.4 

39.4 

39.4 

39.4 

39.5 

39.5 

39.5 

39.5 

39.5 

39.5 

14.4 

14.3 

14 3 

14.2 

14.1 

14.1 

14.0 

14.0 

13.9 

13.9 

8.84 

8.75 

8.66 

8.56 

8.51 

8.46 

8.41 

8.36 

8.31 

8.26 

6.62 

6.52 

6.43 

6.33 

6.28 

6.23 

6.18 

6.12 

6.07 

6.02 

5.46 

5.37 

5.27 

5.17 

5.12 

5.07 

5.01 

4.96 

4.90 

4.85 

4.76 

4.67 

4.57 

4.47 

4.42 

4.36 

4.31 

4.25 

4.20 

4.14 

4.30 

4.20 

4.10 

4.00 

3.95 

3.89 

3.84 

3.78 

3.73 

3.67 

3.96 

3.87 

3.77 

3.67 

3.61 

3.56 

3.51 

3.45 

3.39 

3.33 

3.72 

3.62 

3.52 

3.42 

3.37 

3.31 

3.26 

3.20 

3.14 

3.08 

3.53 

3.43 

3.33 

3.23 

3.17 

3.12 

3.06 

3.00 

2.94 

2.88 

3.37 

3.28 

3.18 

3.07 

3.02 

2.96 

2.91 

2.85 

2.79 

2.72 

3.25 

3.15 

3.05 

2.95 

2.89 

2.84 

2.78 

2.72 

2.66 

2.60 

3.15 

3.05 

2.95 

2.84 

2.79 

2.73 

2.67 

2.61 

2.55 

2.49 

3.06 

2.96 

2.86 

2.76 

2.70 

2.64 

2.58 

2.52 

2.46 

2.40 

2.99 

2.89 

2.79 

2.68 

2.63 

2.57 

2.51 

2.45 

2.38 

2.32 

2.92 

2.82 

2.72 

2.62 

2.56 

2.50 

2.44 

2.38 

2.32 

2.25 

2.87 

2.77 

2.67 

2.56 

2.50 

2.44 

2.38 

2.32 

2.26 

2.19 

2.82 

2.72 

2.62 

2.51 

2.45 

2.39 

2.33 

2.27 

2.20 

2.13 

2.77 

2.68 

2.57 

2.46 

2.41 

2.35 

2.29 

2.22 

2.16 

2.09 

2.73 

2.64 

2.53 

2.42 

2.37 

2.31 

2.25 

2.18 

2.11 

2.04 

2.70 

2.60 

2.50 

2.39 

2.33 

2.27 

2.21 

2.14 

2.08 

2.00 

2.67 

2.57 

2.47 

2.36 

2.30 

2.24 

2.18 

2.11 

2.04 

1.97 

2.64 

2.54 

2.44 

2.33 

2.27 

2.21 

2.15 

2.08 

2.01 

1.94 

2.61 

2.51 

2.41 

2.30 

2.24 

2.18 

2.12 

2.05 

1.98 

1.91 

2.59 

2.49 

2.39 

2.28 

2.22 

2.16 

2.09 

2.03 

1.95 

1.88 

2.57 

2.47 

2.36 

2.25 

2.19 

2.13 

2.07 

2.00 

1.93 

1.85 

2.55 

2.45 

2.34 

2.23 

2.17 

2.11 

2.05 

1.98 

1.91 

1.83 

2.53 

2.43 

2.32 

2.21 

2.15 

2.09 

2.03 

1.96 

1.89 

1.81 

2.51 

2.41 

2.31 

2.20 

2.14 

2.07 

2.01 

1.94 

1.87 

1.79 

2.39 

2.29 

2.18 

2.07 

2.01 

1.94 

1.88 

1.80 

1.72 

1.64 

2.27 

2.17 

2.06 

1.94 

1.88 

1.82 

1.74 

1.67 

1.58 

1.48 

2.16 

2.05 

1.94 

1.82 

1.76 

1.69 

1.61 

1.53 

1.43 

1.31 

2.05 

1.94 

1.83 

1.71 

1.64 

1.57 

1.48 

1.39 

1.27 

1.00 
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Table Vn 

Percentage Points of the F Distribution (cont.) 
a = 0.01 


10 

12 

15 

20 

24 

30 

40 

60 

120 

CO 

6056 

6106 

6157 

6209 

6235 

6261 

6287 

6313 

6339 

6366 

99.4 

99.4 

99.4 

99.4 

99.5 

99.5 

99.5 

99.5 

99.5 

99.5 

27.2 

27.1 

26.9 

26.7 

26.6 

26.5 

26.4 

26.3 

26.2 

26.1 

14.5 

14.4 

14.2 

14.0 

13.9 

13.8 

13.7 

13.7 

13.6 

13.5 

10.1 

9.89 

9.72 

9.55 

9.47 

9.38 

9.29 

9.20 

9.11 

9.02 

7.87 

7.72 

7.56 

7.40 

7.31 

7.23 

7.14 

7.06 

6.97 

6.88 

6.62 

6.47 

6.31 

6.16 

6.07 

5.99 

5.91 

5.82 

5.74 

5.65 

5.81 

5.67 

5.52 

5.36 

5.28 

5.20 

5.12 

5.03 

4.95 

4.86 

5.26 

5.11 

4.96 

4.81 

4.73 

4.65 

4.57 

4.48 

4.40 

4.31 

4.85 

4.71 

4.56 

4.41 

4.33 

4.25 

4.17 

4.08 

4.00 

3.91 

4.54 

4.40 

4.25 

4.10 

4.02 

3.94 

3.86 

3.78 

3.69 

3.60 

4.30 

4.16 

4.01 

3.86 

3.78 

3.70 

3.62 

3.54 

3.45 

3.36 

4.10 

3.96 

3.82 

3.66 

3.59 

3.51 

3.43 

3.34 

3.25 

3.17 

3.94 

3.80 

3.66 

3.51 

3.43 

3.35 

3.27 

3.18 

3.09 

3.00 

3.80 

3.67 

3.52 

3.37 

3.29 

3.21 

3.13 

3.05 

2.96 

2.87 

3.69 

3.55 

3.41 

3.26 

3.18 

3.10 

3.02 

2.93 

2.84 

2.75 

3.59 

3.46 

3.31 

3.16 

3.08 

3.00 

2.92 

2.83 

2.75 

2.65 

3.51 

3.37 

3.23 

3.08 

3.00 

2.92 

2.84 

2.75 

2.66 

2.57 

3.43 

3.30 

3.15 

3.00 

2.92 

2.84 

2.76 

2.67 

2.58 

2.49 

3.37 

3.23 

3.09 

2.94 

2.86 

2.78 

2.69 

2.61 

2.52 

2.42 

3.31 

3.17 

3.03 

2.88 

2.80 

2.72 

2.64 

2.55 

2.46 

2.36 

3.26 

3.12 

2.98 

2.83 

2.75 

2.67 

2.58 

2.50 

2.40 

2.31 

3.21 

3.07 

2.93 

2.78 

2.70 

2.62 

2.54 

2.45 

2.35 

2.26 

3.17 

3.03 

2.89 

2.74 

2.66 

2.58 

2.49 

2.40 

2.31 

2.21 

3.13 

2.99 

2.85 

2.70 

2.62 

2.54 

2.45 

2.36 

2.27 

2.17 

3.09 

2.96 

2.82 

2.66 

2.58 

2.50 

2.42 

2.33 

2.23 

2.13 

3.06 

2.93 

2.78 

2.63 

2.55 

2.47 

2.38 

2.29 

2.20 

2.10 

3.03 

2.90 

2.75 

2.60 

2.52 

2.44 

2.35 

2.26 

2.17 

2.06 

3.00 

2.87 

2.73 

2.57 

2.49 

2.41 

2.33 

2.23 

2.14 

2.03 

2.98 

2.84 

2.70 

2.55 

2.47 

2.39 

2.30 

2.21 

2.11 

2.01 

2.80 

2.66 

2.52 

2.37 

2.29 

2.20 

2.11 

2.02 

1.92 

1.80 

2.63 

2.50 

2.35 

2.20 

2.12 

2.03 

1.94 

1.84 

1.73 

1.60 

2.47 

2.34 

2.19 

2.03 

1.95 

1.86 

1.76 

1.66 

1.53 

1.38 

2.32 

2.18 

2.04 

1.88 

1.79 

1.70 

1.59 

1.47 

1.32 

1.00 
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Table VH 

Percentage Points of the F Distribution (cont.) 
a = 0.005 


10 

12 

15 

20 

24 

30 

40 

60 

120 

OO 

24224 

24426 

24630 

24836 

24940 

25044 

25148 

25253 

25359 

25465 

199 

199 

199 

199 

199 

199 

199 

199 

199 

200 

43.7 

43.4 

43.1 

42.8 

42.6 

42.5 

42.3 

42.1 

42.0 

41.8 

21.0 

20.7 

20.4 

20.2 

20.0 

19.9 

19.8 

19.6 

19.5 

19.3 

13.6 

13.4 

13.1 

12.9 

12.8 

12.7 

12.5 

12.4 

12.3 

12.1 

10.2 

10.0 

9.81 

9.59 

9.47 

9.36 

9.24 

9.12 

9.00 

8.88 

8.38 

8.18 

7.97 

7.75 

7.64 

7.53 

7.42 

7.31 

7.19 

7.08 

7.21 

7.01 

6.81 

6.61 

6.50 

6.40 

6.29 

6.18 

6.06 

5.95 

6.42 

6.23 

6.03 

5.83 

5.73 

5.62 

5.52 

5.41 

5.30 

5.19 

5.85 

5.66 

5.47 

5.27 

5.17 

5.07 

4.97 

4.86 

4.75 

4.64 

5.42 

5.24 

5.05 

4.86 

4.76 

4.65 

4.55 

4.44 

4.34 

4.23 

5.09 

4.91 

4.72 

4.53 

4.43 

4.34 

4.23 

4.12 

4.01 

3.90 

4.82 

4.64 

4.46 

4.27 

4.17 

4.07 

3.97 

3.87 

3.76 

3.65 

4.60 

4.43 

4.25 

4.06 

3.96 

3.86 

3.76 

3.66 

3.55 

3.44 

4.42 

4.25 

4.07 

3.88 

3.79 

3.69 

3.58 

3.48 

3.37 

3.26 

4.27 

4.10 

3.92 

3.73 

3.64 

3.54 

3.44 

3.33 

3.22 

3.11 

4.14 

3.97 

3.79 

3.61 

3.51 

3.41 

3.31 

3.21 

3.10 

2.98 

4.03 

3.86 

3.68 

3.50 

3.40 

3.30 

3.20 

3.10 

2.99 

2.87 

3.93 

3.76 

3.59 

3.40 

3.31 

3.21 

3.11 

3.00 

2.89 

2.78 

3.85 

3.68 

3.50 

3.32 

3.22 

3.12 

3.02 

2.92 

2.81 

2,69 

3.77 

3.60 

3.43 

3.24 

3.15 

3.05 

2.95 

2.84 

2.73 

2.61 

3.70 

3.54 

3.36 

3.18 

3.08 

2.98 

2.88 

2.77 

2.66 

2.55 

3.64 

3.47 

3.30 

3.12 

3.02 

2.92 

2.82 

271 

2.60 

2.48 

3.59 

3.42 

3.25 

3.06 

2.97 

2.87 

2.77 

2.66 

2.55 

2.43 

3.54 

3.37 

3.20 

3.01 

2.92 

2.82 

2.72 

2.61 

2.50 

2.38 

3.49 

3.33 

3.15 

2.97 

2.87 

2.77 

2.67 

2.56 

2.45 

2.33 

3.45 

3.28 

3.11 

2.93 

2.83 

2.73 

2.63 

2 52 

2.41 

2.29 

3.41 

3.25 

3.07 

2.89 

2.79 

2.69 

2.59 

2.48 

2.37 

2.25 

3.38 

3.21 

3.04 

2.86 

2.76 

2.66 

2 56 

2.45 

2.33 

2.21 

3.34 

3.18 

3.01 

2.82 

2.73 

2 . 63 ' 

2.52 

2.42 

2.30 

2.18 

3.12 

2.95 

2.78 

2.60 

2.50 

2.40 

2.30 

2.18 

2.06 

1.93 

2.90 

2.74 

2.57 

2.39 

2.29 

2.19 

2.08 

1.96 

1.83 

1.69 

2.71 

2.54 

2.37 

2.19 

2.09 

1.98 

1.87 

1.75 

1.61 

1.43 

2.52 

2.36 

2.19 

2.00 

1.90 

1.79 

1.67 

1.53 

1.36 

1.00 
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This table is reproduced from W J Dixon and F J Massey, Jr , IntroducUon to 
itcaf Analysis, 2nd ed McGraw Hill, foe , New York, 1957, Table A 13, pp 425-33, 
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Table DC 

Tercentage Points of the Studentized Range* 
= 005 



University Press for the Biomelnka Tnatees, p 176, Table 29 
(upper percentage points), with permission of Professor E S 
Peatson 
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Table IX 

Percentage Points of the Studentized Range (cont.) 
a = 0.05 


10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

6.99 

7.17 

7.32 

7.47 . 

7.60 

7.72 

7.83 

7.93 

8.03 

8.12 

8.21 

6.49 

6.65 

6.79 

6.92 

7.03 

7.14 

7.24 

7.34 

7.43 

7.51 \ 

7.59 

6.16 

6.30 

6.43 

6.55 

6.66 

6.76 

6.85 

6.94 

7.02 

7.09 

7.17 

5.92 

6.05 

6.18 

6.29 

6.39 

6.48 

6.57 

6.65 

6.73 

6.80 

6.87 

5.74 

5.87 

5.98 

6.09 

6.19 

6.28 

6.36 

6.44 

6.51 

6.58 

6.64 

5.60 

5.72 

5.83 

5.93 

6.03 

6.11 

6.20 

6.27 

6.34 

6.40 

6.47 

5.49 

5.61 

5.71 

5.81 

5.90 

5.99 

6.06 

6.14 

6.20 

6.26 

6.33 

5.40 

5.51 

5.62 

5.71 

5.80 

5.88 

5.95 

6.03 

6.09 

6.15 

6.21 

5.32 

5.43 

5.53 

5.63 

5.71 

5.79 

5.86 

5.93 

6.00 

6.05 

6.11 

5.25 

5.36 

5.46 

5.55 

5.64 

5.72 

5.79 

5.85 

5.92 

5.98 

6.03 

5.20 

5.31 

5.40 

5.49 

5.58 

5.65 

5.72 

5.79 

5.85 

5.90 

5.96 

5.15 

5.26 

5.35 

5.44 

5.52 

5.59 

5.66 

5.72 

5.79 

5.84 

5.90 

5.11 

5.21 

5.31 

5.39 

5.47 

5.55 

5.61 

5.68 

5.74 

5.79 

5.84 

5.07 

5.17 

5.27 

5.35 

5.43 

5.50 

5.57 

5.63 

5.69 

5.74 

5.79 

5.04 

5.14 

5.23 

5.32 

5.39 

5.46 

5.53 

5.59 

5.65 

5.70 

5.75 

5.01 

5.11 

5.20 

5.28 

5.36 

5.43 

5.49 

5.55 

5.61 

5.66 

5.71 

4.92 

5.01 

5.10 

5.18 

5.25 

5.32 

5.38 

5.44 

5.50 

5.54 

5.59 

4.83 

4.92 

5.00 

5.08 

5.15 

5.21 

5.27 

5.33 

5.38 

5.43 

5.48 

4.74 

4.82 

4.91 

4.98 

5.05 

5.11 

5.16 

5.22 

5.27 

5.31 

5.36 

4.65 

4.73 

4.81 

4.88 

4.94 

5.00 

5.06 

5.11 

5.16 

5.20 

5.24 

4.56 

4.64 

4.72 

4.78 

4.84 

4.90 

4.95 

5.00 

5.05 

5.09 

5.13 

4.47 

4.55 

4.62 

4.68 

4.74 

4.80 

4.85 

4.89 

4.93 

4.97 

5.01 
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Percentage Points of the Studentized Range* 
■r«001 



• This ubie IS abridged from E S Pearson and H O 
Hartley, Biometnka TiAlfs for Statatielant, Vol 1 (I9S4), 
Cambridge University Press for the Biometnka Trustees, 
p 177, Table 29 (upper percentage points), with permission 
of Professor E S PCaison 
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Table IX 

Percentage Points of the Studentized Range (cont.) 

« = 0.01 


10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

10.24 

10.48 

10.70 

10.89 

11.08 

11.24 

11.40 

11.55 

11.68 

11.81 

11.93 

9.10 

9.30 

9.49 

9.65 

9.81 

9.95 

10.08 

10.21 

10.32 

10.43 

10.54 

8.37 

8.55 

8.71 

8.86 

9.00 

9.12 

9.24 

9.35 

9.46 

9.55 

9.65 

7.87 

8.03 

8.18 

8.31 

8.44 

8.55 

8.66 

8.76 

8.85 

8.94 

9.03 

7.49 

7.65 

7.78 

7.91 

8.03 

8.13 

8.23 

8.32 

8.41 

8.49 

8.57 

7.21 

7.36 

7.48 

7.60 

7.71 

7.81 

7.91 

7.99 

8.07 

8.15 

8.22 

6.99 

7.13 

7.25 

7.36 

7.46 

7.56 

7.65 

7.73 

7.81 

7.88 

7.95 

6.81 

6.94 

.7.06 

7.17 

7.26 

7.36 

7.44 

7.52 

7.59 

7.66 

7.73 

6.67 

6.79 

6.90 

7.01 

7.10 

7.19 

7.27 

7,34 

7.42 

7.48 

7.55 

6.54 

6.66 

6.77 

6.87 

6.96 

7.05 

7.12 

7.20 

7.27 

7.33 

7.39 

6,44 

6.55 

6.66 

6.76 

6.84 

6.93 

7.00 

7.07 

7.14 

7.20 

7.26 

6.35 

6,46 

6.56 

6.66 

6.74 

6.82 

6.90 

6.97 

7.03 

7.09 

7.15 

6.27 

6.38 

6.48 

6.57 

6.66 

6.73 

6.80 

6.87 

6.94 

7.00 

7.05 

6.20 

6.3! 

6.41 

6.50 

6.58 

6.65 

6.72 

6.79 

6.85 

6.91 

6.96 

6.14 

6.25 

6.34 

6.43 

6.51 

6.58 

6.65 

6.72 

6.78 

6.84 

6.89 

6.09 

6.19 

6.29 

6.37 

6.45 

6.52 

6.59 

6.65 

6.71 

6.76 

6.82 

5.92 

6.02 

6.11 

6.19 

6.26 

6.33 

6.39 

6.45 

6.51 

6.56 

6.61 

5.76 

5.85 

5.93 

6.01 

6.08 

6.14 

6.20 

6.26 

6.31 

6.36 

6.41 

5.60 

5.69 

5.77 

5.84 

5.90 

5.96 

6.02 

6.07 

6.12 

6.17 

6.21 

5.45 

5.53 

5.60 

5.67 

5.73 

5.79 

5.84 

5.89 

5.93 

5.98 

6.02 

5.30 

5.38 

5.44 

5.51 

5.56 

5.61 

5.66 

5.71 

5.75 

5.79 

5.83 

5.16 

5.23 

5.29 

5.35 

5.40 

5.45 

5.49 

5.54 

5.57 

5.61 

5.65 
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-06 -04 -02 0 +02 +04 +06 +08 +10 

Sc9lt of f\ {ompte correlation coefficient} 


• This chart II reproduced from F N David, FoMer o/rAe Orefinarn oni/Fro!<(i6ir<r^ 
Integral of the Disinbuiion of the Correlation Coeffieieni in Small Samples, Cambridge 
University Press for Che Biometnia TnisCees, 1938. wiih permission of Professor E S 
Pearson 


Scole of p (= population correlotion coefficient) 


STATISTICAL TABLES 


659 


Table X 

Confidence Belts for the Correlation Coefficient p: p = .99 (cont.) 



1.6 -0.4 -0.2 0 +0.2 +0.4 +0.6 +0.8 +1.0 

Scale of /•{ = sample correlotion coefficient) 


INDEX 


Abelson, R. M., 619 
Acceptance sampling, 148 
. Achenwall, G., 4 
Acton, F. S., 569 
Aitken, A. C., 569 
Alexander, H. W., 248 
Allan, F. E., 417, 448. 569 
Alternative hypothesis, 198 
Analysis of covariance, 561 
Analysis of variance, 312 
and use of effects. 408 
assumptions, 444 
factorial, 455. 460 
four-factor factorial, 47.4 
Latin square, 438 
linear regression. 512 
nested. 378, 383. 384 

one-way classification. 315. 323. 324, 368 
power. 324 

randomized block design; 
factorial experiment, ‘466 
subsampling, 433 

three-factor factorial, 468, 476, 477 
two-factor factorial, 460, 466, 475 
two-way classification, 398, 406, 423 
Anderson, R. L., 99, 192, 248, 290, 308, 
309, 364, 392, 448, 494, 528. 569. 
594, 619 

Anderson, S. L., 309, 449 
Anderson, T. W., 248, 569 
Arithmetic mean Uee Mean) 

Arley, N., 128 
Aroian, L. A., 99 
Askovitz, S. 1., 539, 569 
Aspin, A. A.. 264, 265, 290 
Assumptions, regression, 506 
Attribute, 215 
Auble, D., 608, 619 


Banachiewicz. T.. 569 

Bancroft, T. A., 99. 192. 248. 290. 294. 
308, 309, 364. 392, 448. 479. 494. 
569. 594 

Bartlett, M. S., 303, 308. 309. 392. 449. 
594. 619 

Bartlett’s test, 300 

Baten. W. D.. 449 

Bayes' formula. 126 

Bayes' theorem, 126 

Bechhofer. R. E.. 494 

Behrens- Fisher test. 264 

Bennett. C. A.. 56. 99. 187. 192. 228. 248. 

307. 308. 309. 364. 449. 494. 570. 619 
Bennett. B. M., 494 
Berkson. J.. 594 
Bernoulli. J.. 5 

Bernoulli density function. 120 
Best fit, regression curve ol, 499 
Beta. 0. regression qoetliciein, 500 
Beta density function. 98 
Beta function. 97 
Binomial distribution. 22 
and h> pergeometric distribution, 147 
and normal distribution. 203 
and Poisson distribution. 590 
histograms, 203 
index of dispersion, 590 
normal curve approximation, 204 
population, 64 
Birge, R. T., 570 
Birnbaum, A., 392 
Biserial correlation, 553 
Bivariate distribution; 
discrete, 93 
function, 94 

Bivariate normal density function. 92 
Bivariate normal distribution, 91 
Bliss, C. 1., 274, 494 
Block factor, 395 


661 



662 


Bose. R C. 342, 36S 444. 449 494 

Box, O E P , 265, 290, 302. 309. 449. 494 

Bonvieh. H , 479. 494 

Bradley.R A, 611. 619 

Bfoss 1 . 370, 392 

Brown. G W.. 619 

Brunk, H D,I92 619 

Bryan, J G , 12S 

Buch, K R.12S 

BufTon needle problem, 127 

Bultner, M G . 392 

Bunngton R S . 66, 99, 203 

Burke, C J . 595 

Burman. J. P., 496 


Carnap, R , lOS, 128 
Categories, 9 

Cauchy density function, 66, 172 
Central limit theorem, 155 
Central tendency, 29 
Charlier, C V L.. 99 
Chebyshev P L. 156 
Chebyshev's inequality, 1 54 
Cht square distnbution, 231 
and gamma distribution, 97 
mean and vinance, 231 
moment generating function 247 
ubie. 628, 629 
Chi-square statistic, 577 
Chi square test 
contingency table, 582 
goodness of ht, 576 
of independence, 582 
proporuons, 589 
variances, 235, 300 
Churchman, C W , 6 
Class, 13 
boundary, 14 
frequency, 14 
length, 14 
limits, 20 
mark. 14 

Clopper, C J . 626 627 
Cochran W G . 192, 248, 302, 308 309 
380 392, 429 439, 445, 449. 450. 
494, 496 570, 594 

CoefTicient of correlation (see Correlation 
coefltcient) 

Coefficienl of deierminalion, 549 
Coefficient of skewness, 41 


Goeffictent of vanation, 36 
Cokimn effects, 400 
Column mean, n, , 400 
Cofflbmalion, 118 
linear, 336 
Comparison, 336 
Compieie confounding, 482 
Completely randomized design, 377 
factwial m. 454 

Conditional density function, 169 
Conditional probability, 110, III 
Confidence coelficient, 195 
Confidence interval, 195 
apfM’oximate for ra*. 370 
difference in means, 206 
difference in proportions 209 
linear combination, 341, 342 
mean. 197, 254 
parameters fit, 522 
proportion, 208 
Scheffe’a method, 347 
slope, 507 
symmetne, 197 
varunce, 233, 234 
Confidence intervals 
m simultaneous. 342, 348 
treatment means, 316 
Confidence level, 195 

Confidence limits (are Confidence interval) 
Confounding, 480 
balanced parual, 483 
complete, 482 
partial, 483 

Connor. W S , Jr . 490, 494 
Consistent estimator, 171 
Contingency table, 582 
Contour ellipse, 92 
Contrasts. 326 
independent, 340 
orthogonal 340 
Control chart 213 

Control charts, delecting nonrandomness, 
616 

Control limit, 214 
Converges in probability, 171 
Contdl R G , 533. 570 
Oxicction for bias, randomized block, 419, 
420 

Conelalion coeffiaeni, 91 
biserM), 553 
multiple, 553 



INDEX 


663 


Correlation coefficient (cont.) 
partial, 553 
rank, 553 

relation to regression, 545-553 
sample, 547 
tetrachoric, 553 
Conchy density function, 252 
Covariance, 90 
Covariance analysis, 553~567 
Cowden, D. J., 228 
Cox, D. R., 302, 309 

Cox, G. N., 6, 392, 429, 439, 445, 449, 494 

Cox, P. C, 541 

Craig, A. T., 248 

Craig, C. C, 54, 56, 98, 99 

Cramer, H., 128, 156, 192, 228, 248, 595 

Cramer’s rule, 518, 519 

Critical points, 202 

Critical region, 196 

Crump, S. L., 392 


Daniel, C., 494 
Darwin, C., 5 
Data; 

enumeration, 20 
grouped, 15 
raw, 12 

David, F. N., 449, 551. 570, 658 
Davies, H. M., 449 

Davies, O. L., 187, 192, 259, 299, 350, 
364, 392, 494 
Davis, H. T., 619 
Decile, 18 

Degrees of freedom, 231, 236 
Degrees of freedom identity, 287 
DeLury, D. B., 441, 449 
Deming, W. E., 570 
DeMoivre, A., 5, 156 
Density function: 

Bernoulli, 120 
beta, 98, 306 
Cauchy, 86, 172, 252 
conditional, 169 
continuous, 68 
dichotomous, 63 
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